LLVM/project fd0ad60libclc/opencl/lib/generic/atomic atomic_fetch_add.cl atomic_fetch_sub.cl

libclc: Fix missing overloads for atomic_fetch_add/sub

Follow up to #185263, which missed the overloads which take a memory
order.
DeltaFile
+19-0libclc/opencl/lib/generic/atomic/atomic_fetch_add.cl
+19-0libclc/opencl/lib/generic/atomic/atomic_fetch_sub.cl
+38-02 files

LLVM/project 49ecc68llvm/tools/gold CMakeLists.txt

Uncomment LLVMgold library definition in CMakeLists
DeltaFile
+3-3llvm/tools/gold/CMakeLists.txt
+3-31 files

LLVM/project bec0aeellvm/lib/Target/PowerPC PPCInstrInfo.td, llvm/test/CodeGen/PowerPC wide-scalar-shift-by-byte-multiple-legalization.ll aix-vec_insert_elt.ll

[PowerPC] Use add_like pattern for ADDI/ADDIS add-immediate matching (#187326)

Allow or_disjoint nodes with sext-immediates to make use of the ADD instructions instead of OR (which use zext-immediates) to potentially allow further folding
DeltaFile
+14-15llvm/test/CodeGen/PowerPC/wide-scalar-shift-by-byte-multiple-legalization.ll
+3-3llvm/lib/Target/PowerPC/PPCInstrInfo.td
+2-2llvm/test/CodeGen/PowerPC/aix-vec_insert_elt.ll
+2-2llvm/test/CodeGen/PowerPC/aix32-p8-scalar_vector_conversions.ll
+1-1llvm/test/CodeGen/PowerPC/fp128-bitcast-after-operation.ll
+1-1llvm/test/CodeGen/PowerPC/signbit-shift.ll
+23-241 files not shown
+24-257 files

LLVM/project 201c5c1clang/lib/Basic/Targets AMDGPU.h, clang/test/Misc amdgcn.languageOptsOpenCL.cl

clang: Report subgroup ext types for AMDGPU with llvm env (#188472)

Report cl_khr_subgroup_extended_types for AMDGPU when targeting
the llvm environment.
DeltaFile
+14-0clang/test/Misc/amdgcn.languageOptsOpenCL.cl
+4-0clang/lib/Basic/Targets/AMDGPU.h
+18-02 files

LLVM/project 27ba9e2libc/include CMakeLists.txt stdlib.yaml

[libc] Define Annex K's errno_t in specified headers (#187700)

- Change `errno.h.def` to include a placeholder where hdrgen emits the
public API, which contains the `errno_t` definition.
- Make headers `stdio.h`, `stdlib.h`, `string.h` and `time.h` also
define `errno_t` as specified in the standard.
DeltaFile
+4-0libc/include/CMakeLists.txt
+2-1libc/include/stdlib.yaml
+2-0libc/include/errno.h.def
+1-0libc/include/stdio.yaml
+1-0libc/include/string.yaml
+1-0libc/include/time.yaml
+11-16 files

LLVM/project 5e89d60llvm/lib/Target/AArch64 AArch64InstrInfo.td

[AArch64] Remove add_and_or_is_add pattern matcher and use generic add_like equivalent (#188284)
DeltaFile
+7-20llvm/lib/Target/AArch64/AArch64InstrInfo.td
+7-201 files

LLVM/project 27c1615llvm/include/llvm/ADT GenericSSAContext.h GenericUniformityImpl.h, llvm/lib/CodeGen MachineSSAContext.cpp

review: rename isNeverDivergent to isAlwaysUniform
DeltaFile
+1-1llvm/include/llvm/ADT/GenericSSAContext.h
+1-1llvm/include/llvm/ADT/GenericUniformityImpl.h
+1-1llvm/lib/CodeGen/MachineSSAContext.cpp
+1-1llvm/lib/IR/SSAContext.cpp
+4-44 files

LLVM/project a31ac42clang/lib/Basic/Targets AMDGPU.h, clang/test/Misc amdgcn.languageOptsOpenCL.cl

clang: Report subgroup ext types for AMDGPU with llvm env

Report cl_khr_subgroup_extended_types for AMDGPU when targeting
the llvm environment.
DeltaFile
+14-0clang/test/Misc/amdgcn.languageOptsOpenCL.cl
+4-0clang/lib/Basic/Targets/AMDGPU.h
+18-02 files

LLVM/project ef8c94ellvm/lib/CodeGen MachineSink.cpp, llvm/test/CodeGen/AArch64 sink-and-fold-unique-operand.mir

[MachineSink] Prevent attempts to sink-and-fold into the same instruction more than once (#188048)

When sinking an instruction, we check if the destination instruction can
fold the source instruction into its address mode. If the destination
instruction contains more than one use of the register being sunk, we
won't be able to remove the original instruction, so we should not
attempt to sink.

This also prevents a compiler crash when the destination instruction is
deleted after the first sink-and-fold, and we attempt to sink-and-fold
into it again.

Fixes https://github.com/llvm/llvm-project/issues/187785
DeltaFile
+52-0llvm/test/CodeGen/AArch64/sink-and-fold-unique-operand.mir
+7-0llvm/lib/CodeGen/MachineSink.cpp
+59-02 files

LLVM/project 6ac9ab8mlir/lib/Conversion/ArithToLLVM ArithToLLVM.cpp

[mlir][arith] Fix variable only used by assert (#188461)
DeltaFile
+2-2mlir/lib/Conversion/ArithToLLVM/ArithToLLVM.cpp
+2-21 files

LLVM/project 3439872mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

Fix format
DeltaFile
+4-3mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+4-31 files

LLVM/project 37023aemlir/lib/Dialect/OpenMP/Utils Utils.cpp

simplify private clause check
DeltaFile
+10-13mlir/lib/Dialect/OpenMP/Utils/Utils.cpp
+10-131 files

LLVM/project 7d66342mlir/include/mlir/Dialect/OpenMP/Utils Utils.h, mlir/lib/Dialect/OpenMP CMakeLists.txt

[MLIR][OpenMP] Unify device shared memory logic

This patch creates a utils library for the OpenMP dialect with functions
used by MLIR to LLVM IR translation as well as the stack-to-shared pass
to determine which allocations must use local stack memory or device
shared memory.
DeltaFile
+104-0mlir/lib/Dialect/OpenMP/Utils/Utils.cpp
+10-93mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+13-85mlir/lib/Dialect/OpenMP/Transforms/StackToShared.cpp
+53-0mlir/include/mlir/Dialect/OpenMP/Utils/Utils.h
+13-0mlir/lib/Dialect/OpenMP/Utils/CMakeLists.txt
+1-0mlir/lib/Dialect/OpenMP/CMakeLists.txt
+194-1782 files not shown
+196-1788 files

LLVM/project a4c41e5flang/test/Integration/OpenMP threadprivate-target-device.f90

fix test after rebase
DeltaFile
+0-3flang/test/Integration/OpenMP/threadprivate-target-device.f90
+0-31 files

LLVM/project 97d267fmlir/test/Target/LLVMIR openmp-target-private-shared-mem.mlir

update after rebase
DeltaFile
+2-2mlir/test/Target/LLVMIR/openmp-target-private-shared-mem.mlir
+2-21 files

LLVM/project 8c7e1adllvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks

This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.

The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
DeltaFile
+84-38mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+109-0offload/test/offloading/fortran/target-generic-outlined-loops.f90
+20-15llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+4-4mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir
+3-2llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+2-2llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+222-616 files

LLVM/project 7dca66eflang/lib/Optimizer/OpenMP FunctionFiltering.cpp, flang/test/Lower/OpenMP declare-target-func-and-subr.f90 function-filtering-2.f90

add internal linkage to target device functions
DeltaFile
+23-23flang/test/Lower/OpenMP/declare-target-func-and-subr.f90
+26-19flang/test/Lower/OpenMP/function-filtering-2.f90
+22-22flang/test/Lower/OpenMP/declare-target-implicit-func-and-subr-cap.f90
+20-20flang/test/Lower/OpenMP/declare-target-implicit-func-and-subr-cap-enter.f90
+7-7flang/test/Lower/OpenMP/declare-target-implicit-tarop-cap.f90
+6-0flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
+104-916 files

LLVM/project 976afc8mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

support other map-like clauses
DeltaFile
+13-3mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+13-31 files

LLVM/project 424853dflang/test/Integration/OpenMP target-use-device-nested.f90 threadprivate-target-device.f90, mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

add missing check
DeltaFile
+76-0mlir/test/Target/LLVMIR/openmp-target-private-shared-mem.mlir
+12-13flang/test/Integration/OpenMP/target-use-device-nested.f90
+6-5flang/test/Integration/OpenMP/threadprivate-target-device.f90
+6-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+1-1mlir/test/Target/LLVMIR/omptarget-constant-alloca-raise.mlir
+101-205 files

LLVM/project 7bfd1d9flang/lib/Optimizer/OpenMP StackToShared.cpp, flang/test/Transforms/OpenMP stack-to-shared.mlir

move stack-to-shared pass to the omp dialect
DeltaFile
+0-215flang/test/Transforms/OpenMP/stack-to-shared.mlir
+188-0mlir/lib/Dialect/OpenMP/Transforms/StackToShared.cpp
+0-162flang/lib/Optimizer/OpenMP/StackToShared.cpp
+149-0mlir/test/Dialect/OpenMP/stack-to-shared.mlir
+1-19mlir/lib/Dialect/OpenMP/CMakeLists.txt
+18-0mlir/lib/Dialect/OpenMP/IR/CMakeLists.txt
+356-3969 files not shown
+405-41815 files

LLVM/project 64c7113mlir/include/mlir/Dialect/OpenMP/Transforms Passes.td, mlir/lib/Dialect/OpenMP/Transforms StackToShared.cpp

update after rebase and address review comments
DeltaFile
+15-7mlir/lib/Dialect/OpenMP/Transforms/StackToShared.cpp
+7-7mlir/test/Dialect/OpenMP/stack-to-shared.mlir
+5-5mlir/include/mlir/Dialect/OpenMP/Transforms/Passes.td
+27-193 files

LLVM/project 699c14fflang/include/flang/Optimizer/OpenMP Passes.td, flang/lib/Optimizer/OpenMP StackToShared.cpp CMakeLists.txt

[Flang][OpenMP] Add pass to replace allocas with device shared memory

This patch introduces a new Flang OpenMP MLIR pass, only ran for target device
modules, that identifies `fir.alloca` operations that should use device shared
memory and replaces them with pairs of `omp.alloc_shared_mem` and
`omp.free_shared_mem` operations.

This works in conjunction to the MLIR to LLVM IR translation pass' handling of
privatization, mapping and reductions in the OpenMP dialect to properly select
the right memory space for allocations based on where they are made and where
they are used.

This pass, in particular, handles explicit stack allocations in MLIR, whereas
the aforementioned translation pass takes care of implicit ones represented by
entry block arguments.
DeltaFile
+215-0flang/test/Transforms/OpenMP/stack-to-shared.mlir
+162-0flang/lib/Optimizer/OpenMP/StackToShared.cpp
+17-0flang/include/flang/Optimizer/OpenMP/Passes.td
+3-1flang/lib/Optimizer/Passes/Pipelines.cpp
+1-0flang/lib/Optimizer/OpenMP/CMakeLists.txt
+398-15 files

LLVM/project 6185164flang/lib/Optimizer/CodeGen CodeGenOpenMP.cpp, mlir/include/mlir/Dialect/OpenMP OpenMPOps.td

simplify omp.alloc_shared_mem
DeltaFile
+30-14mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+42-0mlir/test/Target/LLVMIR/omptarget-device-shared-mem.mlir
+15-27flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp
+28-9mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+13-16mlir/test/Dialect/OpenMP/ops.mlir
+16-7mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+144-731 files not shown
+157-797 files

LLVM/project 6386271mlir/include/mlir/Dialect/OpenMP OpenMPOps.td OpenMPClauses.td, mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

address review comments: make omp.free_shared_mem self-contained, update alignment handling for shared memory allocations
DeltaFile
+25-26mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+20-30mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+37-0mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+15-8mlir/test/Dialect/OpenMP/invalid.mlir
+8-4mlir/test/Dialect/OpenMP/ops.mlir
+5-5mlir/test/Target/LLVMIR/omptarget-device-shared-mem.mlir
+110-731 files not shown
+112-787 files

LLVM/project 1567365flang/lib/Optimizer/CodeGen CodeGenOpenMP.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

[Flang][MLIR][OpenMP] Add explicit shared memory (de-)allocation ops

This patch introduces the `omp.alloc_shared_mem` and `omp.free_shared_mem`
operations to represent explicit allocations and deallocations of shared memory
across threads in a team, mirroring the existing `omp.target_allocmem` and
`omp.target_freemem`.

The `omp.alloc_shared_mem` op goes through the same Flang-specific
transformations as `omp.target_allocmem`, so that the size of the buffer can be
properly calculated when translating to LLVM IR.

The corresponding runtime functions produced for these new operations are
`__kmpc_alloc_shared` and `__kmpc_free_shared`, which previously could only be
created for implicit allocations (e.g. privatized and reduction variables).
DeltaFile
+56-8mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+61-0mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+27-15flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp
+25-12llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+29-2mlir/test/Dialect/OpenMP/ops.mlir
+23-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+221-372 files not shown
+253-378 files

LLVM/project 13dadacmlir/include/mlir/Dialect/OpenMP OpenMPOps.td OpenMPClauses.td, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[MLIR][OpenMP] Refactor omp.target_allocmem to allow reuse, NFC

This patch moves tablegen definitions that could be used for all kinds of heap
allocations out of `omp.target_allocmem` and into a new
`OpenMP_HeapAllocClause` that can be reused.

Descriptions are updated to follow the format of most other operations and the
custom verifier for `omp.target_allocmem` is removed as it only made a
redundant check on its result type.
DeltaFile
+52-101mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+30-44mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+53-0mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+24-0mlir/test/Dialect/OpenMP/ops.mlir
+14-0mlir/test/Dialect/OpenMP/invalid.mlir
+173-1455 files

LLVM/project c2b7cfamlir/test/Target/LLVMIR openmp-taskloop-bounds-cast.mlir

Fix new test a third time
DeltaFile
+1-1mlir/test/Target/LLVMIR/openmp-taskloop-bounds-cast.mlir
+1-11 files

LLVM/project 120f10dllvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: address suggestions
DeltaFile
+21-18llvm/include/llvm/ADT/GenericUniformityImpl.h
+3-3llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/always-uniform-gmir.mir
+3-3llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/always-uniform.mir
+2-1llvm/lib/Analysis/UniformityAnalysis.cpp
+1-1llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/never-uniform.mir
+30-265 files

LLVM/project faec719flang/test/Lower/OpenMP taskloop.f90, mlir/include/mlir/Dialect/OpenMP OpenMPOps.td

[mlir][OpenMP] Rename TaskloopOp/omp.taskloop to TaskloopWrapperOp/omp.taskloop.wrapper

Rename the loop wrapper operation to better distinguish it from the
context op (omp.taskloop.context), which handles outlining and runtime calls.
The new name makes the role of each operation clearer at a glance.

RFC: https://discourse.llvm.org/t/rfc-openmp-alloca-placement-for-openmp-loop-wrappers/89512/7

Patch 3/3

Assisted-by: Copilot, Claude Sonnet 4.6
DeltaFile
+37-37mlir/test/Dialect/OpenMP/ops.mlir
+21-21mlir/test/Dialect/OpenMP/invalid.mlir
+15-14mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+13-12mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+10-10flang/test/Lower/OpenMP/taskloop.f90
+9-9mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+105-10331 files not shown
+176-17437 files

LLVM/project b90e09dflang/lib/Lower/OpenMP OpenMP.cpp, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[mlir][OpenMP] Rename taskLoopOp/taskloopOp variables to taskLoopWrapperOp/taskloopWrapperOp

Rename local variables for clarity to better reflect the type they hold.

Assisted-by: Copilot, Claude Sonnet 4.6
DeltaFile
+2-2flang/lib/Lower/OpenMP/OpenMP.cpp
+2-2mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+4-42 files