LLVM/project dde579bmlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-taskgroup-task-reduction.mlir openmp-todo.mlir

[mlir][OpenMP] Translate task_reduction on omp.taskgroup (#199565)

This patch adds LLVM IR translation for `task_reduction` on
`omp.taskgroup`.

Flang already parses, checks, and lowers the relevant task-reduction
constructs to OpenMP MLIR, but the LLVM IR translation path was
incomplete. This patch implements the taskgroup reduction setup needed
by the follow-up taskloop and task `in_reduction` work.

For each reducer on `omp.taskgroup`, the translation emits init and
combiner helpers from the corresponding `omp.declare_reduction` regions,
builds the `kmp_taskred_input_t` descriptor array, and calls
`__kmpc_taskred_init` before entering the taskgroup body.

This patch intentionally keeps `reduction` / `in_reduction` on
`omp.taskloop.context` unsupported. Those are handled in follow-up PRs.

### Stack / review order

    [20 lines not shown]
DeltaFile
+255-7mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+238-0mlir/test/Target/LLVMIR/openmp-taskgroup-task-reduction.mlir
+55-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+548-103 files

LLVM/project 3f7f732clang/docs ReleaseNotes.rst, llvm/lib/Transforms/Instrumentation InstrProfiling.cpp

[PGO] Implement PGO counter promotion for atomic updates (#202487)

Currently PGO counter updates are promoted/hoisted out of loops where
possible, in order to reduce memory accesses. The promotion is
implemented via the LoadAndStorePromoter and SSAUpdater classes.
When the updates are relaxed atomic, however, hoisting doesn't happen.

Reading the semantics of relaxed atomics, it should be legal to do
similar promotions, but teaching LoadAndStorePromoter and SSAUpdater
seems like alot of work and would touch common code used by alot of
llvm optimizations such as SROA.

An easier approach, implemented here, is to perform the promotions on 
non-atomic updates, then transform the promoted updates to (relaxed)
atomic.

I also added a flag-guarded sanity check, that a user can use to make
sure all PGO counter updates have been made atomic (in case we miss
some).

    [32 lines not shown]
DeltaFile
+89-23llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+61-0llvm/test/Transforms/PGOProfile/atomic_counter_promote.ll
+3-3llvm/test/Transforms/PGOProfile/counter_promo.ll
+2-0clang/docs/ReleaseNotes.rst
+155-264 files

LLVM/project a696a09clang/include module.modulemap

[clang][modules] Add BuiltinsAVR.def as textual header in module.modulemap (#204584)

The new  'BuiltinsAVR.def' was introduced in https://github.com/llvm/llvm-project/pull/203214.
DeltaFile
+1-0clang/include/module.modulemap
+1-01 files

LLVM/project fabd339llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/test/CodeGen/AArch64/GlobalISel combine-binop-neg.mir

[GISel] CombinerHelper::matchBinopWithNegInner should only look for not on the LHS of G_SUB. (#204257)

Fixes #204219.
DeltaFile
+59-0llvm/test/CodeGen/AArch64/GlobalISel/combine-binop-neg.mir
+3-1llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+62-12 files

LLVM/project ee2c896llvm/include/llvm/Transforms/Coroutines CoroInstr.h, llvm/lib/IR Verifier.cpp

[Coro] Handle aliases to coroutines (#204408)

Aliases to coroutines appear to not be handled, this PR addresses that.
DeltaFile
+13-0llvm/test/Transforms/Coroutines/coro-id-alias.ll
+8-4llvm/include/llvm/Transforms/Coroutines/CoroInstr.h
+5-4llvm/lib/Transforms/Coroutines/Coroutines.cpp
+4-2llvm/lib/Transforms/Coroutines/CoroCleanup.cpp
+1-1llvm/lib/IR/Verifier.cpp
+31-115 files

LLVM/project b02659aclang/lib/CodeGen/TargetBuiltins RISCV.cpp, clang/lib/Headers riscv_packed_simd.h

[RISCV][P-ext] Support Packed Absolute Value and Absolute Difference (#203840)

This pr support RISC-V P extension intrinsics [Packed Absolute Value and
Absolute
Difference](https://github.com/riscv/riscv-p-spec/blob/master/P-ext-intrinsics.adoc#packed-absolute-value-and-absolute-difference)
DeltaFile
+256-0clang/test/CodeGen/RISCV/rvp-intrinsics.c
+70-0cross-project-tests/intrinsic-header-tests/riscv_packed_simd.c
+56-0llvm/test/CodeGen/RISCV/rvp-simd-64.ll
+36-0llvm/test/CodeGen/RISCV/rvp-simd-32.ll
+29-0clang/lib/Headers/riscv_packed_simd.h
+22-1clang/lib/CodeGen/TargetBuiltins/RISCV.cpp
+469-13 files not shown
+504-49 files

LLVM/project 6ea75ecclang/lib/Basic/Targets WebAssembly.h, clang/test/CodeGen/WebAssembly wasm-swiftasynccall.c

clang: enable `swiftasynccall` for Wasm (#203330)

Follow-up to https://github.com/llvm/llvm-project/pull/188296, where in
LLVM `swiftasynccall` is lowered to Wasm `return_call` and
`return_call_indirect` instructions when tail calls are enabled. This
still needed to be enabled at the Clang level in
`checkCallingConvention` in `lib/Basic/Targets/WebAssembly.h`.
DeltaFile
+42-0clang/test/CodeGen/WebAssembly/wasm-swiftasynccall.c
+10-0clang/test/Sema/wasm-swiftasynccall.c
+1-1clang/lib/Basic/Targets/WebAssembly.h
+53-13 files

LLVM/project 0d02d39llvm/lib/Target/AMDGPU AMDGPUMCResourceInfo.cpp, llvm/test/CodeGen/AMDGPU indirect-call-agpr-cap.ll indirect-call-vgpr-cap.ll

Revert "[AMDGPU] Capping max number of registers to function's occupancy budget for indirect calls" (#204605)

Reverts llvm/llvm-project#199765

Broke https://lab.llvm.org/buildbot/#/builders/10
DeltaFile
+0-57llvm/test/CodeGen/AMDGPU/indirect-call-agpr-cap.ll
+0-54llvm/test/CodeGen/AMDGPU/indirect-call-vgpr-cap.ll
+22-22llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
+21-21llvm/test/CodeGen/AMDGPU/function-resource-usage.ll
+0-35llvm/test/CodeGen/AMDGPU/indirect-call-sgpr-cap.ll
+4-29llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
+47-2188 files not shown
+93-27014 files

LLVM/project c08ab39llvm/lib/Target/AMDGPU AMDGPUMCResourceInfo.cpp, llvm/test/CodeGen/AMDGPU indirect-call-agpr-cap.ll indirect-call-vgpr-cap.ll

Revert "[AMDGPU] Capping max number of registers to function's occupancy budg…"

This reverts commit 567eeec75c26f4d0fca28659fb829b8b466539f2.
DeltaFile
+0-57llvm/test/CodeGen/AMDGPU/indirect-call-agpr-cap.ll
+0-54llvm/test/CodeGen/AMDGPU/indirect-call-vgpr-cap.ll
+22-22llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
+21-21llvm/test/CodeGen/AMDGPU/function-resource-usage.ll
+0-35llvm/test/CodeGen/AMDGPU/indirect-call-sgpr-cap.ll
+4-29llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
+47-2188 files not shown
+93-27014 files

LLVM/project 6d895d0clang/include/clang/Analysis/Analyses/LifetimeSafety FactsGenerator.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp Origins.cpp

[LifetimeSafety] Propagate loans through the GNU binary conditional (#204439)

FactsGenerator only handled the ternary, so a borrow used through the
GNU binary conditional `a ?: b` was silently dropped. Handle both via
VisitAbstractConditionalOperator, flowing from
getTrueExpr()/getFalseExpr(). For `a ?: b` getTrueExpr() is an
OpaqueValueExpr, so make OpaqueValueExpr transparent in the origin
manager and peel it in the arm-reachability check; guard against flowing
a void (e.g. throw) arm.

Assisted-by: Claude Opus 4.8

Co-authored-by: Gabor Horvath <gaborh at apple.com>
DeltaFile
+98-0clang/test/Sema/LifetimeSafety/safety.cpp
+15-8clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+7-0clang/lib/Analysis/LifetimeSafety/Origins.cpp
+1-1clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+121-94 files

LLVM/project 068220alibcxx/test/std/containers/views/mdspan/layout_left index_operator.pass.cpp, libcxx/test/std/containers/views/mdspan/layout_right index_operator.pass.cpp

[libc++][mdspan][test] Correct `mapping::operator()` constraint tests (#201061)

The previous requires-expression only checked that `std::is_same_v<...>`
was a well-formed expression, so the test would pass even when the
result was false.
DeltaFile
+2-2libcxx/test/std/containers/views/mdspan/layout_right/index_operator.pass.cpp
+2-2libcxx/test/std/containers/views/mdspan/layout_left/index_operator.pass.cpp
+2-2libcxx/test/std/containers/views/mdspan/layout_stride/index_operator.pass.cpp
+6-63 files

LLVM/project 84ed575libcxx/modules/std utility.inc

[libc++] Add missing constant_wrapper in the std module (#202038)

The std module declaration of `constant_wrapper` was missed in #191695.
DeltaFile
+6-0libcxx/modules/std/utility.inc
+6-01 files

LLVM/project 586b892mlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/VectorToLLVM ConvertVectorToLLVM.cpp ConvertVectorToLLVMPass.cpp

[mlir][VectorToLLVM] add opt-in `enable-gep-inbounds-nuw` pass flag for `vector.load/store` (#202118)

> This patch follows up on #201180 and the refactoring #202766 (which
made `affine-super-vectorize` emit `in_bounds = [true]` on
`vector.transfer_read`/`write` when accesses are statically provable to
be within bounds). With that in place, the `VectorToLLVM` lowering was
still emitting `llvm.getelementptr` without `inbounds`/`nuw`, so LLVM
could not exploit the no-wrap guarantee: SCEV could not prove the index
arithmetic monotone (loop vectorizer bailed out) and BasicAliasAnalysis
fell back to conservative aliasing.

Without `inbounds`/`nuw` on the GEP that `vector.load`/`vector.store`
lower to, LLVM cannot exploit no-wrap guarantees: SCEV fails to prove
loop-index monotonicity (loop vectorizer bails), and BasicAliasAnalysis
falls back to conservative aliasing.

### Why opt-in

Unlike `memref.load`, `vector.load`/`vector.store` intentionally allow

    [37 lines not shown]
DeltaFile
+238-0mlir/test/Conversion/VectorToLLVM/vector-load-store-to-llvm.mlir
+0-228mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
+44-8mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+5-2mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d_inbounds.mlir
+6-0mlir/include/mlir/Conversion/Passes.td
+1-1mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
+294-2391 files not shown
+295-2407 files

LLVM/project de9bb21libcxx/docs/ReleaseNotes 23.rst

[libc++] Restore release note dropped during rebase of #196495 (#204435)

A release note about transitive include removals was inadvertently
dropped during a rebase of #196495 before merge. This restores it.
DeltaFile
+1-0libcxx/docs/ReleaseNotes/23.rst
+1-01 files

LLVM/project 02cdcc2libcxx/docs/Status Cxx29Papers.csv Cxx29Issues.csv

[libc++] Add Github issue links in Brno-voted papers (#204579)
DeltaFile
+21-21libcxx/docs/Status/Cxx29Papers.csv
+18-18libcxx/docs/Status/Cxx29Issues.csv
+39-392 files

LLVM/project add2d9allvm/test/tools/llvm-pdbutil dxcontainer.test, llvm/tools/llvm-pdbutil DumpOutputStyle.cpp StreamUtil.cpp

[llvm-pdbutil] Add DXContainer support to `llvm-pdbutil dump` (#200485)

This patch adds `--dxcontainer` option that attempts to parse a
`DXContainer` from stream 5 data (generated by DirectX tools) of a PDB
file, and if successful, dumps the basic info about it. If `DXContainer`
wasn't parsed, shows that it is not present in the file.
DeltaFile
+42-0llvm/tools/llvm-pdbutil/DumpOutputStyle.cpp
+25-0llvm/test/tools/llvm-pdbutil/dxcontainer.test
+6-0llvm/tools/llvm-pdbutil/StreamUtil.cpp
+3-0llvm/tools/llvm-pdbutil/llvm-pdbutil.cpp
+1-0llvm/tools/llvm-pdbutil/llvm-pdbutil.h
+1-0llvm/tools/llvm-pdbutil/DumpOutputStyle.h
+78-01 files not shown
+79-07 files

LLVM/project f1bdb76mlir/include/mlir/Dialect/MemRef/IR MemRefOps.td, mlir/lib/Conversion/MemRefToLLVM MemRefToLLVM.cpp

[mlir][MemRefToLLVM] fix incorrect `nuw` on `GEP/mul` when lowering `memref.load/store` with negative strides (#204309)

`MemRefToLLVM` was unconditionally emitting `getelementptr inbounds|nuw`
(and consequently `mul overflow<nsw,nuw>` on every intermediate index
computation inside `getStridedElementPtr`) for all `memref.load` and
`memref.store` lowerings.

This is _unsound_ when any stride is negative or dynamic.
`getStridedElementPtr` propagates `GEPNoWrapFlags::nuw` to
`IntegerOverflowFlags::nuw` on every intermediate `llvm.mul` and
`llvm.add` it emits. With a negative stride (e.g. `-1`, which is
`2^64-1` unsigned), an access like index=5 produces `mul nuw 5,
(2^64-1)`, which unsigned-overflows and yields poison per LangRef —
regardless of whether the final offset happens to be non-negative.

This issue came up in the discussion in PR #202118. Thanks to
@banach-space for the detailed discussion.

This PR hopefully concludes the path to fix the regression related to

    [6 lines not shown]
DeltaFile
+30-12mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
+20-8mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
+16-8mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
+1-1mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
+67-294 files

LLVM/project 8e181f4clang/test/Driver amdgpu-xnack-sramecc-flags.c, llvm/lib/Target/AMDGPU AMDGPUAsmPrinter.cpp

AMDGPU: Use module flags to control xnack and sramecc

This ensures these ABI details are encoded in the IR module
rather than depending on external state from command-line flags.
Previously, these were encoded as function-level subtarget features.
The code object output was a single target ID directive implied
by the global subtarget. The backend would previously check if a
function's subtarget feature mismatched the global subtarget. This
is avoided by making xnack and sramecc module-level properties from
the start. This also provides proper linker compatibility
enforcement, moving the error point earlier.

The old encoding was also an abuse of the subtarget feature system.
Subtarget features are a bitvector, and later features in the string
can override earlier ones. The old handling added a special case
where explicit settings were preserved: ordinarily +feature,-feature
should result in the feature being disabled, but +xnack,-xnack would
preserve the explicit "-xnack" state, which differs from the absence
of any xnack setting.

    [25 lines not shown]
DeltaFile
+52-52llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll
+30-46llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+75-0llvm/test/CodeGen/AMDGPU/module-flag-xnack.ll
+36-33clang/test/Driver/amdgpu-xnack-sramecc-flags.c
+66-0llvm/test/CodeGen/AMDGPU/module-flag-sramecc.ll
+47-4llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+306-13592 files not shown
+1,105-35498 files

LLVM/project d2dd4ceclang/lib/AST/ByteCode EvaluationResult.h

[clang][bytecode][NFC] Mark results as non-empty when taking a value (#204568)

This was missing and all the EvaluationResults always ended up being
empty even though their APValue was set. Since the assert(!empty()) was
missing from `takeAPValue()`, nobody noticed though.
DeltaFile
+3-7clang/lib/AST/ByteCode/EvaluationResult.h
+3-71 files

LLVM/project 3c33c36libcxx/src atomic.cpp

[libc++] Use public os_sync API instead of private __ulock on newer Apple platforms (#202519)

The atomic wait and wake implementation on Apple platforms currently
relies on `__ulock_wait` and `__ulock_wake`, which are private kernel
APIs. This is a problem for anyone shipping apps through the App Store
since Apple flags private symbol usage during review.

Starting with macOS 14.4 and iOS 17.4, Apple ships public replacements
through `os_sync_wait_on_address` and `os_sync_wake_by_address_any/all`
in `<os/os_sync_wait_on_address.h>`. These cover the same functionality
and are documented, stable, and safe for App Store submissions.

This patch makes use of the public APIs instead of the private ones
whenever the underlying OS permits it.

This takes over #182947.
Fixes #182908
Fixes #146142

Co-authored-by: Bbn08 <atrancendentbeing at gmail.com>
DeltaFile
+87-15libcxx/src/atomic.cpp
+87-151 files

LLVM/project e8a022alibc/include/llvm-libc-macros/linux sys-ioctl-macros.h

[libc] Include linux headers to get ioctl macros (#204555)

Linux has many existing ioctls and keeps adding them, so a
hand-maintained list would always be out of date. Additionally, some
ioctls have architecture specific numbers (some in a very subtle way --
by having the number depend on the size of a structure).

asm/ioctls.h and linux/sockios.h are pretty clean, and are already
included by glibc, so we can just do the same to get the latest
definitions.
DeltaFile
+2-8libc/include/llvm-libc-macros/linux/sys-ioctl-macros.h
+2-81 files

LLVM/project 23abfa0mlir/include/mlir/Dialect/Tosa/IR TosaTypesBase.td, mlir/test/Conversion/TosaToArith tosa-to-arith.mlir

[mlir][tosa] Allow rank-0 vector operands in tosa.apply_scale (#199924)

I was facing a bug that can be reproduced this way:

```mlir
 // RUN:  mlir-opt --transform-interpreter tosa_apply_scale_rank0_repro.mlir
  #map        = affine_map<(d0) -> (d0)>
  #map_scalar = affine_map<(d0) -> ()>

  func.func @repro(%input: tensor<64xi32>, %scalar_t: tensor<i32>,
                   %out_init: tensor<64xi8>) -> tensor<64xi8> {
    %c31_i8     = arith.constant 31 : i8
    %cScale_i32 = arith.constant -1010580540 : i32

    %tile_out = linalg.generic
      { indexing_maps = [#map, #map_scalar, #map],
        iterator_types = ["parallel"] }
      ins(%input, %scalar_t : tensor<64xi32>, tensor<i32>)
      outs(%out_init : tensor<64xi8>) {

    [52 lines not shown]
DeltaFile
+10-0mlir/test/Conversion/TosaToArith/tosa-to-arith.mlir
+1-1mlir/include/mlir/Dialect/Tosa/IR/TosaTypesBase.td
+11-12 files

LLVM/project d77f3bfllvm/utils/gn/secondary/clang/tools/clang-ssaf-format BUILD.gn

[gn] port 53dabae40fb3a8514 more (#204578)
DeltaFile
+1-0llvm/utils/gn/secondary/clang/tools/clang-ssaf-format/BUILD.gn
+1-01 files

LLVM/project 1343b64compiler-rt/cmake builtin-config-ix.cmake

[compiler-rt] Fix default builtins target _Float16 detection on x86_64/i386 (#204474)
DeltaFile
+6-0compiler-rt/cmake/builtin-config-ix.cmake
+6-01 files

LLVM/project 7fda520llvm/utils/gn/secondary/llvm/unittests/Transforms/Vectorize BUILD.gn

[gn build] Port 4d812375c174 (#204575)
DeltaFile
+0-1llvm/utils/gn/secondary/llvm/unittests/Transforms/Vectorize/BUILD.gn
+0-11 files

LLVM/project 8b0462fllvm/utils/gn/secondary/clang/lib/Driver BUILD.gn, llvm/utils/gn/secondary/clang/lib/FrontendTool BUILD.gn

[gn] port 53dabae40fb3a8514 (ssaf/SourceTransformation) (#204574)
DeltaFile
+12-0llvm/utils/gn/secondary/clang/lib/ScalableStaticAnalysisFramework/SourceTransformation/BUILD.gn
+4-0llvm/utils/gn/secondary/clang/unittests/ScalableStaticAnalysisFramework/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/tools/clang-ssaf-analyzer/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/lib/Driver/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/lib/FrontendTool/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/tools/clang-ssaf-linker/BUILD.gn
+20-06 files

LLVM/project 5fe9132llvm/utils/gn/secondary/clang/lib/ScalableStaticAnalysisFramework/Core BUILD.gn

[gn build] Port 6e21a04a5a96 (#204576)
DeltaFile
+1-0llvm/utils/gn/secondary/clang/lib/ScalableStaticAnalysisFramework/Core/BUILD.gn
+1-01 files

LLVM/project df06afbllvm/lib/Target/AMDGPU SIWholeQuadMode.cpp, llvm/test/CodeGen/AMDGPU wqm.mir licm-wwm.mir

[AMDGPU] Mark all instructions in WWM region as convergent

Mark instructions between ENTER_STRICT_WWM and EXIT_STRICT_WWM as
convergent, so they don't get moved out of the whole wave mode region
(see the licm-wwm.mir test). This doesn't automagically fix all our
woes, since things can still be moved out of the region before we even
run si-wqm, but there are rumours about moving WWM formation earlier
anyway.

This is not a substitute for proper WWM support - in particular, this
would inhibit most optimizations inside WWM regions with complex control
flow. Right now most WWM is relatively limited in size and complexity,
so I think this is acceptable until we get a more principled solution.

I haven't thought too much about whether or not we need this for WQM as
well.

Assisted by: Claude Sonnet

commit-id:9204c7e2
DeltaFile
+17-17llvm/test/CodeGen/AMDGPU/wqm.mir
+24-1llvm/test/CodeGen/AMDGPU/licm-wwm.mir
+5-5llvm/test/CodeGen/AMDGPU/wqm-debug-instr.mir
+8-0llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+2-2llvm/test/CodeGen/AMDGPU/si-init-whole-wave.mir
+56-255 files

LLVM/project 28ac70ellvm/docs AMDGPUExecutionSynchronization.rst

[AMDGPU][doc] Refactor Barrier Execution Model

Remove everything that has to do with named barriers and put it in a series of model extensions specific to /sbarrier/named-barriers.

I had to change a few things to make it fit, in summary:

Base Model:

* Stylistic changes that make it easier to refer to specific rules. Each rule is in a rubric instead of a bullet point.
* (-) No longer defines `barrier-mutually-exclusive`
* (-) No longer defines barrier `join` and any associated rule.

New named barrier extensions
* Define "named barrier" as a sub-type of barrier objects. This makes barrier-mutually-exclusive redundant.
* Define barrier join as an op that can exclusively be done on `named barrier objects`.
* Define rules relating to join and its ordering with other barrier operations

Following these changes, the target tables changed a bit as well.


    [2 lines not shown]
DeltaFile
+200-154llvm/docs/AMDGPUExecutionSynchronization.rst
+200-1541 files

LLVM/project 60416cfopenmp/runtime/src kmp_traits.cpp kmp_traits.h, openmp/runtime/src/i18n en_US.txt

[libomp] Parse OMP_DEFAULT_DEVICE with new device trait parser

... but do not yet expose the new functionalities to the user. This is a
backward compatible update that is going to be followed by the step to
the OpenMP 6.0 semantics as defined in 4.3.8.
DeltaFile
+105-0openmp/runtime/unittests/Traits/TestOMPTraitParser.cpp
+24-0openmp/runtime/src/kmp_traits.cpp
+8-0openmp/runtime/src/kmp_traits.h
+3-2openmp/runtime/src/kmp_settings.cpp
+3-0openmp/runtime/src/i18n/en_US.txt
+143-25 files