[Clang] Add multilib support for GPU targets (#192285)
Summary:
This PR uses the new, generic multilib support added in
https://github.com/llvm/llvm-project/pull/188584
to also function for GPU targets. This will allow toolchains to easy
provide variants of these GPU libraries (for debug or asan). In
practice, this will look something like this:
```console
-DRUNTIMES_amdgcn-amd-amdhsa+debug_CMAKE_BUILD_TYPE=Debug \
-DRUNTIMES_amdgcn-amd-amdhsa+debug_LIBOMPTARGET_ENABLE_DEBUG=ON \
-DRUNTIMES_amdgcn-amd-amdhsa+debug_LLVM_ENABLE_RUNTIMES=openmp \
-DLLVM_RUNTIME_MULTILIBS=debug \
-DLLVM_RUNTIME_MULTILIB_debug_TARGETS="amdgcn-amd-amdhsa" \
```
This will then install it into the tree like this:
```
[7 lines not shown]
[flang][OpenMP] Get final label from nested constructs
Non-block DO loops can share termination statements. When parsing
a non-block DO loop, account for labels on terminating statements
from recursively parsed ExecutionPartConstructs.
Fixes https://github.com/llvm/llvm-project/issues/188892
[flang] Inline scalar-to-scalar TRANSFER for same-size trivial types (#191589)
Inline the TRANSFER intrinsic for scalar-to-scalar cases where the
result is a trivial type (integer, real, etc.) and source and result
have the same storage size. Instead of calling _FortranATransfer, the
lowering now emits a fir.convert on the source address followed by a
fir.load, effectively performing a reinterpret cast.
[MCP] Never eliminate frame-setup/destroy instructions (#186237)
Presumably targets only insert frame instructions which are significant,
and there may be effects MCP doesn't model. Similar to reserved
registers this
is probably overly conservative, but as this causes no codegen change in
any lit test I think it is benign.
The motivation is just to clean up #183149 for AMDGPU, as we can spill
to physical registers, and currently have to spill the EXEC mask purely
to enable debug-info.
Change-Id: I9ea4a09b34464c43322edd2900361bf635efd9f7
[OpenMP][Device] Fix __llvm_omp_indirect_call_lookup function pointer types (#192502)
`__llvm_omp_indirect_call_lookup` takes in and returns a function
pointer, so make sure the types are correct, which includes the correct
address space.
The FE was recently changed to generate the correct code
[here](https://github.com/llvm/llvm-project/pull/192470).
With this change, three function pointer tests start passing.
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[CIR] Implement EH handling for field initializers (#192360)
This implements the handling to call the dtor for any previously
initialized fields of destructed type if an exception is thrown later in
the initialization of the containing class.
The basic infrastructure to handle this was already in place. We just
needed a function to push an EH-only destroy cleanup on the EH stack and
a call to that function.
[CIR] Fix FlattenCFG pattern rewriter contract violations (#192359)
Fix patterns in CIRFlattenCFGPass that modify IR but return failure(),
violating the MLIR greedy pattern rewriter contract. The contract
requires that if a pattern modifies IR, it must return success().
- CIRCleanupScopeOpFlattening: always return success() since IR is
modified (blocks split, regions inlined) before error paths
- Ternary op flattening: return success() instead of falling through
after emitError, since splitBlock/createBlock already modified IR
- Use rewriter.moveOpBefore() instead of direct defOp->moveBefore() to
properly notify the rewriter of IR mutations
Found by MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
Test: flatten-cleanup-scope-nyi.cir (a silly one since it's testing an
error, but point still valid)
[AMDGPU] InstCombine: fold invalid calls to amdgcn intrinsics into poison values (#191904)
Replace a call to amdgpu intrinsic into a poison value when the call is
invalid because of "amdgpu-no-<xyz>" attribute in the caller function.
Upon
https://github.com/llvm/llvm-project/pull/186925#pullrequestreview-3983414064
Assisted by claude-4.6-sonnet-medium through CURSOR.
[CIR] Fix InlineAsmOp roundtrip parse crash on cir.asm (#186588)
Fix InlineAsmOp parser/printer roundtrip for cir.asm and avoid null
operand_attrs entries that crash alias printing during
--verify-roundtrip.
- Parse attr-dict before optional result arrow to match print order.
- Use non-null sentinel attributes for non-maybe_memory operands and
check UnitAttr explicitly.
- Keep lowering semantics by treating only UnitAttr as maybe_memory
marker.
- Update inline-asm CIR IR test to run with --verify-roundtrip and add
an attr+result coverage case.
Fix https://github.com/llvm/llvm-project/issues/161441
CodeGen: Fix double counting bundles in inst size verification
The AMDGPU implementation handles bundles by summing the
member instructions. This was starting with the size of the
bundle instruction, then re-adding all of the same instructions.
This loop is over the iterator, not instr_iterator, so it should
not be looking through the bundled instructions. Most of the other
uses of getInstSizeInBytes are also on the iterator, not the
instr_iterator so the convention seems to be targets need to handle
BUNDLE correctly themselves.
[CIR] Add noundef to __cxx_global_array_dtor parameter (#191529)
The synthetic __cxx_global_array_dtor helper created by
LoweringPrepare was missing noundef on its ptr parameter,
causing a mismatch with classic codegen.
[CIR][NFC] Remove redundant global-var-simple.cpp test (#192354)
This early smoke test is fully covered by
`clang/test/CIR/CodeGen/globals.cpp` and is no longer needed.
Per @andykaylor's feedback on #191521.
Made with [Cursor](https://cursor.com)
[mlir][spirv][nfc] Move GroupNonUniformBallotBitCount tests to `non-uniform-ops.mlir` (#192115)
Tests were incorrectly placed in `group-ops.mlir` since the op is
defined in `SPIRVNonUniformOps.td`.