[RISCV] Enhance RISCVMoveMerger for GPRPair Moves on RV32 #180831 (#182416)
Extends RISCVMoveMerger to identify adjacent 32-bit moves that can be
combined into a single 64-bit move instruction. In particular, this
patch adds support for extension zdinx (`fmv.d`) and p(`padd.dw`).
Fixes #180831
[MC] Remove redundant setting of AllowDollarAtStartOfIdentifier. NFC (#183339)
This setting defaults to false so there is no need to set it unless we
want it to be true.
This makes it easy to see at a glace which backends support this, and
matches the existing behaviour of other fields such as
`AllowAtAtStartOfIdentifier`, `AllowQuestionAtStartOfIdentifier`,
`UseAssignmentForEHBegin` and `AllowAtInName`. These are all only ever
set to true in subclasses, never false.
[lldb] Fix logic issue in TestDAP_stopped_events.py (#183382)
The subset should actually be the expected data because the real thread
data may have additional information.
[AMDGPU] Adding FoldMemRefOpsIntoTransposeLoadOp pattern (#183330)
Before the fix we wouldn't fold a trivial expand_shape as index
computation. This will later force expand_shape to materialize into a
extract_stride_metadata and a reinterpret_cast unnecessarily. The
example below showcase the motivation of a source IR that won't be able
to fold today.
```mlir
%expanded = memref.expand_shape %buf [[0, 1], [2, 3]]
: memref<32x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>
into memref<1x32x8x16xf16, strided<..., offset: ?>, #gpu.address_space<workgroup>>
amdgpu.transpose_load %expanded[%i, %j, %k, %l]
: memref<1x32x8x16xf16, ...> -> vector<4xf16>
```
With this pattern that matches the more generic
`FoldMemRefAliasOpsPass`, the expand_shape can now fold into
transpose_load op like other load/stores.
[4 lines not shown]
[CodeGen] Expand power-of-2 div/rem at IR level in ExpandIRInsts. (#180654)
Previously, power-of-2 div/rem operations wider than
MaxLegalDivRemBitWidth were excluded from IR expansion and left for
backend peephole optimizations. Some backends can fail to process such
instructions in case we switch off DAGCombiner.
Now ExpandIRInsts expands them into shift/mask sequences:
- udiv X, 2^C -> lshr X, C
- urem X, 2^C -> and X, (2^C - 1)
- sdiv X, 2^C -> bias adjustment + ashr X, C
- srem X, 2^C -> X - (((X + Bias) >> C) << C)
Special cases handled:
- Division/remainder by 1 or -1 (identity, negation, or zero)
- Exact division (sdiv exact skips bias, produces ashr exact)
- Negative power-of-2 divisors (result is negated)
- INT_MIN divisor (correct via countr_zero on bit pattern)
[2 lines not shown]
[AArch64] Decompose FADD reductions with known zero elements
FADDV is matched into FADDPv4f32 + FADDPv2f32p but this can be relaxed
when one element (usually the 4th) or more are known to be zero.
Before:
movi d1, #0000000000000000
mov v0.s[3], v1.s[0]
faddp v0.4s, v0.4s, v0.4s
faddp s0, v0.2s
After:
mov s1, v0.s[2]
faddp s0, v0.2s
fadd s0, s0, s1
[CIR] Update cir::ResumeOp to require an EH token (#183192)
This updates the cir::ResumeOp operation to require an EH token operand.
We already had the token available at both locations where the operation
was being created. Adding this operand makes finding the token more
robust during CFG flattening.
This change was entirely AI generated, but I have reviewed it closely.
[flang][openmp] Add support for ordered regions in SIMD directives (#… (#183379)
Add support for ordered regions within SIMD directives (!$omp simd
ordered and !$omp do simd ordered). This initial implementation matches
Clang's behavior.
In SIMD directives, loop induction variables have an implicit linear
clause with deferred store semantics (storing to .linear_result). To
properly support ordered regions, the LinearClauseProcessor rewrites
variable references to use .linear_result in:
- omp.ordered.region: Code inside ordered blocks
- omp_region.finalize: Code after ordered blocks
Note: The vectorizer cannot currently vectorize loops with ordered
regions. Future enhancement would require generating lane loops or
unrolling ordered regions across SIMD lanes while maintaining ordering
semantics.
This PR is a reland for https://github.com/llvm/llvm-project/pull/181012
and fixes the regression caused by syntax change in IR for linear clause
[CodeGen] Add tests for ShadowStackGCLowering IR pass (#183167)
Add llvm/test/CodeGen/Generic/shadow-stack-gc-lowering.ll testing the
opt-level behavior of the shadow-stack-gc-lowering module pass,
covering:
- Single root: frame push/pop at entry and return
- Two roots: multi-slot frame, NumRoots=2/NumMeta=0 in the frame map
- Root with non-null metadata: NumMeta=1, metadata array in gc_map
- Mixed metadata: CollectRoots ordering (metadata roots sorted first)
- No roots: pass must leave the function unchanged
- Invoke: EscapeEnumerator inserts pop on both normal and unwind exits
As requested in https://github.com/llvm/llvm-project/pull/178436, since
the only existing tests seem to be that llc doesn't crash (in
llvm/test/CodeGen/X86/GC)
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com