LLVM/project 9313342llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[OpenMP][flang] Support GPU team reductions on allocatables

Extends the work started in #165714 by supporting team reductions.
Similar to what was done in #165714, this PR introduces proper
allocations, loads, and stores for by-ref reductions in teams-related
callbacks:
* `_omp_reduction_list_to_global_copy_func`,
* `_omp_reduction_list_to_global_reduce_func`,
* `_omp_reduction_global_to_list_copy_func`, and
* `_omp_reduction_global_to_list_reduce_func`.
DeltaFile
+102-35llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+121-0mlir/test/Target/LLVMIR/allocatable_gpu_reduction_teams.mlir
+8-4llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+2-0mlir/test/Target/LLVMIR/allocatable_gpu_reduction.mlir
+233-394 files

LLVM/project 497a973llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Address reviewer feedback: fix getWaitCountMax and reduce code duplication

- Fix getWaitCountMax() to use correct bitmasks based on architecture:
  - Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
  - GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
  a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
DeltaFile
+18-32llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+18-321 files

LLVM/project 650e2c8clang/include/clang/Basic BuiltinsAMDGPU.def, clang/test/SemaHIP amdgpu-gfx950-load-to-lds.hip

[HIP][AMDGPU] Remove 't' from all __builtin_*_load_lds builtins

Allows for type checking depending on the builtin signature.

stack-info: PR: https://github.com/llvm/llvm-project/pull/165389, branch: users/jmmartinez/fix/load_lds_typesignature/3
DeltaFile
+15-15clang/test/SemaHIP/amdgpu-gfx950-load-to-lds.hip
+3-3clang/include/clang/Basic/BuiltinsAMDGPU.def
+18-182 files

LLVM/project 13ee271clang/test/SemaHIP amdgpu-gfx950-load-to-lds.hip

[NFC][HIP] Add __builtin_*_load_lds type check test cases

This tests show how typechecking is performed for
__builtin_amdgcn_load_to_lds, but not for
__builtin_amdgcn_raw_ptr_buffer_load_lds,
__builtin_amdgcn_struct_ptr_buffer_load_lds and
__builtin_amdgcn_global_load_lds since they are declared with the 't'
attribute.

stack-info: PR: https://github.com/llvm/llvm-project/pull/165388, branch: users/jmmartinez/fix/load_lds_typesignature/2
DeltaFile
+26-1clang/test/SemaHIP/amdgpu-gfx950-load-to-lds.hip
+26-11 files

LLVM/project ce8ed7dllvm/lib/Target/AArch64 MachineSMEABIPass.cpp

Tweak comment

Change-Id: I497d0384e28e5b7531efce5ad6f1a52e6d186261
DeltaFile
+2-2llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+2-21 files

LLVM/project e3de8ffllvm/utils/gn/secondary/clang/lib/Driver BUILD.gn, llvm/utils/gn/secondary/clang/lib/Frontend BUILD.gn

[gn build] Port d090311aa7df
DeltaFile
+2-0llvm/utils/gn/secondary/clang/lib/Driver/BUILD.gn
+1-1llvm/utils/gn/secondary/clang/lib/Frontend/BUILD.gn
+3-12 files

LLVM/project 0a35f44clang/lib/Sema SemaExpr.cpp, clang/test/SemaHIP amdgpu-gfx950-load-to-lds.hip

[HIP] Perform implicit pointer cast when compiling HIP, not when -fcuda-is-device (#165387)

When compiling HIP device code, we add implicit casts for the pointer arguments passed to built-in calls.

When compiling for the host, apply the same casts, since the device side of the source (device functions and kernels) should still pass type checks.
DeltaFile
+13-13clang/test/SemaHIP/amdgpu-gfx950-load-to-lds.hip
+7-8clang/lib/Sema/SemaExpr.cpp
+20-212 files

LLVM/project f743676bolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/runtime/AArch64 pacret-synchronous-unwind.cpp

[BOLT][PAC] Warn about synchronous unwind tables

BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See also: #165215
DeltaFile
+33-0bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
+8-1bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+41-12 files

LLVM/project 28b3151bolt/lib/Passes PointerAuthCFIAnalyzer.cpp

[BOLT] PointerAuthCFIAnalyzer: return early if there is no work

- makes sure we do not divide by zero, to calculate the % of ignored
  functions.
DeltaFile
+3-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+3-01 files

LLVM/project 1c94b83bolt/lib/Passes PointerAuthCFIAnalyzer.cpp

[BOLT] Add comment about the chosen threshold
DeltaFile
+10-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+10-01 files

LLVM/project 5e36fc0bolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/AArch64 pacret-cfi-incorrect.s

[BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer
DeltaFile
+17-10bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+1-1bolt/test/AArch64/pacret-cfi-incorrect.s
+18-112 files

LLVM/project 2713157bolt/docs PointerAuthDesign.md

[BOLT][doc] Add two resolutions for CFI
DeltaFile
+4-0bolt/docs/PointerAuthDesign.md
+4-01 files

LLVM/project d930440bolt/docs PacRetDesign.md

Update bolt/docs/PacRetDesign.md

Co-authored-by: Paschalis Mpeis <paschalis.mpeis at arm.com>
DeltaFile
+1-1bolt/docs/PacRetDesign.md
+1-11 files

LLVM/project 0cbc530bolt/docs PointerAuthDesign.md PacRetDesign.md, bolt/test/AArch64 negate-ra-state-incorrect.s pacret-cfi-incorrect.s

[NFC] Rename PAuth tests to have a common prefix
DeltaFile
+236-0bolt/docs/PointerAuthDesign.md
+0-236bolt/docs/PacRetDesign.md
+0-78bolt/test/AArch64/negate-ra-state-incorrect.s
+78-0bolt/test/AArch64/pacret-cfi-incorrect.s
+76-0bolt/test/AArch64/pacret-cfi.s
+0-76bolt/test/AArch64/negate-ra-state.s
+390-3908 files not shown
+556-55614 files

LLVM/project 45f82c1bolt/include/bolt/Passes PointerAuthCFIFixup.h PointerAuthCFIAnalyzer.h, bolt/lib/Rewrite BinaryPassManager.cpp

[BOLT] Add --print flags for PointerAuthCFI* passes
DeltaFile
+13-2bolt/lib/Rewrite/BinaryPassManager.cpp
+7-1bolt/unittests/Passes/PointerAuthCFIFixup.cpp
+2-1bolt/include/bolt/Passes/PointerAuthCFIFixup.h
+2-1bolt/include/bolt/Passes/PointerAuthCFIAnalyzer.h
+24-54 files

LLVM/project e70890fbolt/docs PacRetDesign.md

Update PacRetDesign.md
DeltaFile
+1-1bolt/docs/PacRetDesign.md
+1-11 files

LLVM/project 6ca491ebolt/lib/Passes InsertNegateRAStatePass.cpp PointerAuthCFIFixup.cpp, bolt/unittests/Passes InsertNegateRAState.cpp PointerAuthCFIFixup.cpp

[BOLT][NFC] Rename Pointer Auth DWARF rewriter passes

Original names were "working titles". After initial patches are merged,
I'd like to rename these passes to names that reflect their intent
better and show their relationship to each other:

InsertNegateRAStatePass renamed to PointerAuthCFIFixup,
MarkRAStates renamed to PointerAuthCFIAnalyzer.
DeltaFile
+0-288bolt/unittests/Passes/InsertNegateRAState.cpp
+288-0bolt/unittests/Passes/PointerAuthCFIFixup.cpp
+0-271bolt/lib/Passes/InsertNegateRAStatePass.cpp
+271-0bolt/lib/Passes/PointerAuthCFIFixup.cpp
+145-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+0-145bolt/lib/Passes/MarkRAStates.cpp
+704-70413 files not shown
+841-84019 files

LLVM/project 7623847bolt/include/bolt/Passes InsertNegateRAStatePass.h

[BOLT] Update InsertNegateRAStatePass.h

- remove unused declarations
- update comments
DeltaFile
+6-17bolt/include/bolt/Passes/InsertNegateRAStatePass.h
+6-171 files

LLVM/project 23f9030mlir/lib/Dialect/GPU/Pipelines GPUToXeVMPipeline.cpp

Reland: [GPUToXeVMPipeline][Pipeline] Modify pipeline to add `convert-vector-to-llvm`. (#169573)

`convert-vector-to-llvm` pass applies a set of vector transformation
patterns that are not included in the standard `convert-to-llvm` pass
interface. These additional transformations are required to properly
lower MLIR vector operations. Since not all vector ops have direct
`llvm` dialect lowering, many of them must first be progressively
rewritten into simpler or more canonical vector ops, which are then
lowered to `llvm`. Therefore, running `convert-vector-to-llvm` is
necessary to ensure a complete and correct lowering of vector operations
to the `llvm` dialect.

Original PR: https://github.com/llvm/llvm-project/pull/166204 +
post-commit failure fixes.
DeltaFile
+3-0mlir/lib/Dialect/GPU/Pipelines/GPUToXeVMPipeline.cpp
+3-01 files

LLVM/project 2b8d363llvm/include/llvm/Analysis Delinearization.h DependenceAnalysis.h, llvm/lib/Analysis Delinearization.cpp

[Delinearization] Remove tryDelinearizeFixedSizeImpl (#169046)

`tryDelinearizeFixedSizeImpl` is a heuristic function relying on GEP's
type information. Using these information to drive an optimization
heuristic is not allowed, so this function should be removed. As #161822
and #164798 have eliminated all calls to this, this patch removes the
function itself.
DeltaFile
+0-38llvm/lib/Analysis/Delinearization.cpp
+2-14llvm/include/llvm/Analysis/Delinearization.h
+2-2llvm/include/llvm/Analysis/DependenceAnalysis.h
+4-543 files

LLVM/project b92a4b6llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-func-global-inv.mir

skip the loadcnt wait at function boundaries when no VGPRs have pending loads
DeltaFile
+115-0llvm/test/CodeGen/AMDGPU/waitcnt-func-global-inv.mir
+26-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+141-12 files

LLVM/project bbbe511clang/include/clang/Options Options.td, clang/lib/Driver Driver.cpp

Reland "[clang][Driver] Support for the SPIR-V backend when compiling HIP" (#169637)

This relands "[clang][Driver] Support for the SPIR-V backend when compiling HIP" #167543. The only new change is a small fix for the multicall driver.

For HIP, the SPIR-V backend can be optionally activated with the -use-spirv-backend flag. This option uses the SPIR-V BE instead of the SPIR-V translator. These changes also ensure that -use-spirv-backend does not require external dependencies, such as spirv-as and spirv-link
DeltaFile
+80-0clang/test/Driver/hip-spirv-backend-phases.c
+61-0clang/test/Driver/hip-spirv-backend-opt.c
+57-0clang/test/Driver/hip-spirv-backend-bindings.c
+36-13clang/lib/Driver/ToolChains/HIPAMD.cpp
+41-3clang/lib/Driver/Driver.cpp
+10-0clang/include/clang/Options/Options.td
+285-161 files not shown
+294-177 files

LLVM/project c7e7f0cllvm/test/CodeGen/AMDGPU global-atomicrmw-fadd.ll flat-atomicrmw-fadd.ll

Pre-commit: affected test checks after patch
DeltaFile
+0-224llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
+0-100llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fadd.ll
+0-90llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll
+0-90llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll
+0-90llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll
+0-90llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll
+0-68424 files not shown
+14-1,34830 files

LLVM/project 661687bbolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/AArch64 pacret-cfi-incorrect.s

[BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer
DeltaFile
+17-10bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+1-1bolt/test/AArch64/pacret-cfi-incorrect.s
+18-112 files

LLVM/project a191decbolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/runtime/AArch64 pacret-synchronous-unwind.cpp

[BOLT][PAC] Warn about synchronous unwind tables

BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See also: #165215
DeltaFile
+33-0bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
+8-1bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+41-12 files

LLVM/project fff7be1bolt/lib/Passes PointerAuthCFIAnalyzer.cpp

[BOLT] PointerAuthCFIAnalyzer: return early if there is no work

- makes sure we do not divide by zero, to calculate the % of ignored
  functions.
DeltaFile
+3-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+3-01 files

LLVM/project a1e9e33bolt/lib/Passes PointerAuthCFIAnalyzer.cpp

[BOLT] Add comment about the chosen threshold
DeltaFile
+10-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+10-01 files

LLVM/project 6c3f064bolt/docs PointerAuthDesign.md

[BOLT][doc] Add two resolutions for CFI
DeltaFile
+4-0bolt/docs/PointerAuthDesign.md
+4-01 files

LLVM/project feeeb52bolt/docs PointerAuthDesign.md PacRetDesign.md, bolt/test/AArch64 pacret-cfi-incorrect.s negate-ra-state-incorrect.s

[NFC] Rename PAuth tests to have a common prefix
DeltaFile
+236-0bolt/docs/PointerAuthDesign.md
+0-236bolt/docs/PacRetDesign.md
+78-0bolt/test/AArch64/pacret-cfi-incorrect.s
+0-78bolt/test/AArch64/negate-ra-state-incorrect.s
+0-76bolt/test/AArch64/negate-ra-state.s
+76-0bolt/test/AArch64/pacret-cfi.s
+390-3908 files not shown
+556-55614 files

LLVM/project 34576e6bolt/docs PacRetDesign.md

Update PacRetDesign.md
DeltaFile
+1-1bolt/docs/PacRetDesign.md
+1-11 files