[CIR] Fix InlineAsmOp roundtrip parse crash on cir.asm (#186588)
Fix InlineAsmOp parser/printer roundtrip for cir.asm and avoid null
operand_attrs entries that crash alias printing during
--verify-roundtrip.
- Parse attr-dict before optional result arrow to match print order.
- Use non-null sentinel attributes for non-maybe_memory operands and
check UnitAttr explicitly.
- Keep lowering semantics by treating only UnitAttr as maybe_memory
marker.
- Update inline-asm CIR IR test to run with --verify-roundtrip and add
an attr+result coverage case.
Fix https://github.com/llvm/llvm-project/issues/161441
[CIR] Add noundef to __cxx_global_array_dtor parameter (#191529)
The synthetic __cxx_global_array_dtor helper created by
LoweringPrepare was missing noundef on its ptr parameter,
causing a mismatch with classic codegen.
[CIR][NFC] Remove redundant global-var-simple.cpp test (#192354)
This early smoke test is fully covered by
`clang/test/CIR/CodeGen/globals.cpp` and is no longer needed.
Per @andykaylor's feedback on #191521.
Made with [Cursor](https://cursor.com)
[mlir][spirv][nfc] Move GroupNonUniformBallotBitCount tests to `non-uniform-ops.mlir` (#192115)
Tests were incorrectly placed in `group-ops.mlir` since the op is
defined in `SPIRVNonUniformOps.td`.
AMDGPU: Add NextUseAnalysis Pass (#178873)
Based on
- https://github.com/llvm/llvm-project/pull/156079 and
- https://github.com/llvm/llvm-project/pull/171520
See those PRs for background.
Provides a compatibility mode option
`--amdgpu-next-use-analysis-compatibility-mode` that produces results
that match either PR #156079 (`compute`) or PR #171520 (`graphics`).
Co-authored-by: alex-t <atimofee at amd.com>
Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou at amd.com>
---------
Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou at amd.com>
[SLP] Normalize copyable operand order via majority voting
When building operands for entries with copyable elements, non-copyable
lanes of commutative ops may have inconsistent operand order (e.g. some
lanes have load,add while others have add,load). This prevents
VLOperands::reorder() from grouping consecutive loads on one side,
degrading downstream vectorization.
Add majority-voting normalization during buildOperands: track the
(ValueID, ValueID) pair frequency across non-copyable lanes and swap
any lane whose operand types are the exact inverse of the most common
pattern. This makes operand order consistent, enabling better load
grouping.
This is part 1 of #189181.
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/191631
[clang][OpenMP] Fix __llvm_omp_indirect_call_lookup signature for targets with non-default program AS (#192470)
The argument and return value for `__llvm_omp_indirect_call_lookup` are
function pointers so make sure they are in the correct address space.
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[NFC][SPIRV] Move `SPIRVStripConvergenceIntrinsics` to Utils (#188537)
The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.
The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.
Enables the pass for targeting DirectX and is a pre-req for:
https://github.com/llvm/llvm-project/pull/188792.
Assisted by: Github Copilot
[MCP][NFC] Opinionated refactoring
There are a few minor inconsistencies across the pass which I found mildly
distracting:
* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
the `optional`.
Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.
Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.
[11 lines not shown]
[MCP] Never eliminate frame-setup/destroy instructions
Presumably targets only insert frame instructions which are significant,
and there may be effects MCP doesn't model. Similar to reserved registers this
is probably overly conservative, but as this causes no codegen change in
any lit test I think it is benign.
The motivation is just to clean up #183149 for AMDGPU, as we can spill
to physical registers, and currently have to spill the EXEC mask purely
to enable debug-info.
Change-Id: I9ea4a09b34464c43322edd2900361bf635efd9f7
[MCP][NFC] Cleanup and prepare to preserve frame-setup/destroy
This mixes renames, removing redundant code, avoiding
`else`-after-`return`, etc. with factoring out the `isNeverRedundant`
concept.
Change-Id: I43a62a9415019cdd63c68fd3b915ebb7505d317a
[MCP][NFC] Opinionated refactoring (#186239)
There are a few minor inconsistencies across the pass which I found mildly distracting:
* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
the `optional`.
Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.
Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.
The refactor uses structured bindings for a couple reasons:
[9 lines not shown]
[X86] Use unsigned comparison for stack clash probing loop (#192355)
The stack clash probing loop generated in `EmitLoweredProbedAlloca` used
a signed comparison (`X86::COND_GE`) to determine when the allocation
target had been reached.
In 32-bit mode, memory addresses above `0x80000000` have the sign bit
set. If the stack pointer lands in this region, treating the addresses
as signed integers causes the comparison logic to fail. This leads to
incorrect loop execution, resulting in an infinite loop and a crash
(segmentation fault) when setting up custom stacks for pthreads mapped
above `0x80000000` in a 32b process.
This patch changes the condition code to `X86::COND_AE` (Above or
Equal), which generates an unsigned comparison. This ensures that
addresses are treated correctly as unsigned quantities on all targets.
On 64-bit systems, this change has no practical effect on valid
user-space addresses because they do not use the sign bit (being
[4 lines not shown]
[CIR][AArch64] Lower NEON vrsra_n intrinsics (#191129)
### Summary
Implement CIR lowering for all intrinsics in
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#vector-rounding-shift-right-and-accumulate
This PR references the implementation from the ClangIR incubator:
https://github.com/llvm/clangir/blob/main/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp#L4854
AArch64 does not provide a dedicated "rounding shift right by immediate"
instruction. Instead, the `SRSHL` / `URSHL` intrinsics take a signed
per-lane shift amount where a negative value means right shift, so an
immediate right shift by `n` is encoded as a signed vector splat of
`-n`. The three infrastructure changes below exist to support this
encoding at the call site:
- extends `emitNeonShiftVector` with a `neg` parameter so the
right-shift-as-negative-left-shift encoding is handled inside the
helper;
[8 lines not shown]
[mlir][acc] Add canonicalization patterns for compute_region (#192376)
This PR improves the APIs for navigating through acc.compute_region
block arguments and also adds canonicalization patterns for those
arguments to remove unused ones and merge duplicates.
[NFC][AMDGPU] Rename AMDGPUUnifyDivergentExitNodes to AMDGPUUnifyDivergentExitNodesLegacy (#192399)
### Summary
This NFC patch renames the legacy pass wrapper class for
`AMDGPUUnifyDivergentExitNodes` to
`AMDGPUUnifyDivergentExitNodesLegacy`. This makes the old pass manager
wrapper explicit and avoids ambiguity. No behavior change is intended.
[SPIRV] Conditionally define `__AMDGCN_UNSAFE_FP_ATOMICS__` for AMDGCN flavoured SPIR-V (#192136)
Client apps rely on the `__AMDGCN_UNSAFE_FP_ATOMICS__` macro to guide
optimised execution pathways. We were not defining it for AMDGCN
flavoured SPIR-V, which led to pessimisation.
[AArch64][PAC] Rework the expansion of AUT/AUTPAC pseudos
Refactor `AArch64AsmPrinter::emitPtrauthAuthResign` to improve
readability and fix the conditions when `emitPtrauthDiscriminator` is
allowed to clobber AddrDisc.
* do not clobber `AUTAddrDisc` when computing `AUTDiscReg` on resigning
if `AUTAddrDisc == PACAddrDisc`, as it would prevent passing raw,
64-bit value as the new discriminator
* move the code computing `ShouldCheck` and `ShouldTrap` conditions to a
separate function
[VPlan] Remove ComputeAnyOfResult, use ComputeReductionResult. (#190039)
ComputeAnyOfResult is simply a boolean OR reduction. Remove the
dedicated opcode and model directly via ComputeReductionResult.
This simplifies and unifies the code, as well as enabling trivial
constant folding.
PR: https://github.com/llvm/llvm-project/pull/190039