[PowerPC] Use add_like pattern for ADDI/ADDIS add-immediate matching (#187326)
Allow or_disjoint nodes with sext-immediates to make use of the ADD instructions instead of OR (which use zext-immediates) to potentially allow further folding
clang: Report subgroup ext types for AMDGPU with llvm env (#188472)
Report cl_khr_subgroup_extended_types for AMDGPU when targeting
the llvm environment.
[libc] Define Annex K's errno_t in specified headers (#187700)
- Change `errno.h.def` to include a placeholder where hdrgen emits the
public API, which contains the `errno_t` definition.
- Make headers `stdio.h`, `stdlib.h`, `string.h` and `time.h` also
define `errno_t` as specified in the standard.
[MachineSink] Prevent attempts to sink-and-fold into the same instruction more than once (#188048)
When sinking an instruction, we check if the destination instruction can
fold the source instruction into its address mode. If the destination
instruction contains more than one use of the register being sunk, we
won't be able to remove the original instruction, so we should not
attempt to sink.
This also prevents a compiler crash when the destination instruction is
deleted after the first sink-and-fold, and we attempt to sink-and-fold
into it again.
Fixes https://github.com/llvm/llvm-project/issues/187785
[MLIR][OpenMP] Unify device shared memory logic
This patch creates a utils library for the OpenMP dialect with functions
used by MLIR to LLVM IR translation as well as the stack-to-shared pass
to determine which allocations must use local stack memory or device
shared memory.
[MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
[Flang][OpenMP] Add pass to replace allocas with device shared memory
This patch introduces a new Flang OpenMP MLIR pass, only ran for target device
modules, that identifies `fir.alloca` operations that should use device shared
memory and replaces them with pairs of `omp.alloc_shared_mem` and
`omp.free_shared_mem` operations.
This works in conjunction to the MLIR to LLVM IR translation pass' handling of
privatization, mapping and reductions in the OpenMP dialect to properly select
the right memory space for allocations based on where they are made and where
they are used.
This pass, in particular, handles explicit stack allocations in MLIR, whereas
the aforementioned translation pass takes care of implicit ones represented by
entry block arguments.
[Flang][MLIR][OpenMP] Add explicit shared memory (de-)allocation ops
This patch introduces the `omp.alloc_shared_mem` and `omp.free_shared_mem`
operations to represent explicit allocations and deallocations of shared memory
across threads in a team, mirroring the existing `omp.target_allocmem` and
`omp.target_freemem`.
The `omp.alloc_shared_mem` op goes through the same Flang-specific
transformations as `omp.target_allocmem`, so that the size of the buffer can be
properly calculated when translating to LLVM IR.
The corresponding runtime functions produced for these new operations are
`__kmpc_alloc_shared` and `__kmpc_free_shared`, which previously could only be
created for implicit allocations (e.g. privatized and reduction variables).
[MLIR][OpenMP] Refactor omp.target_allocmem to allow reuse, NFC
This patch moves tablegen definitions that could be used for all kinds of heap
allocations out of `omp.target_allocmem` and into a new
`OpenMP_HeapAllocClause` that can be reused.
Descriptions are updated to follow the format of most other operations and the
custom verifier for `omp.target_allocmem` is removed as it only made a
redundant check on its result type.
[mlir][OpenMP] Rename TaskloopOp/omp.taskloop to TaskloopWrapperOp/omp.taskloop.wrapper
Rename the loop wrapper operation to better distinguish it from the
context op (omp.taskloop.context), which handles outlining and runtime calls.
The new name makes the role of each operation clearer at a glance.
RFC: https://discourse.llvm.org/t/rfc-openmp-alloca-placement-for-openmp-loop-wrappers/89512/7
Patch 3/3
Assisted-by: Copilot, Claude Sonnet 4.6
[mlir][OpenMP] Rename taskLoopOp/taskloopOp variables to taskLoopWrapperOp/taskloopWrapperOp
Rename local variables for clarity to better reflect the type they hold.
Assisted-by: Copilot, Claude Sonnet 4.6