[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn_perm intrinsic (#187798)
Add uniform and divergent register bank legalization rules for the amdgcn_perm intrinsic (v_perm_b32). Since this is a VALU-only instruction, the uniform case maps the destination to UniInVgprB32 and all source operands to VgprB32.
[mlir] Use Repeated<T> in more places to avoid temporary vectors. NFC. (#188846)
Replace `SmallVector<Type/Value>(n, x)` with `Repeated<Type/Value>(n,
x)`. This avoids heap allocations for repeated values.
Also change `ExtractAddressComputations` rebuild callbacks from
`ArrayRef<Value>` to `ValueRange` to enable `Repeated<Value>`
passthrough.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn_permlane64 (#187840)
Add register bank legalization rules for the amdgcn_permlane64 intrinsic
in the new RegBankLegalize framework.
After GISel legalization, permlane64 always operates on S32 — sub-32-bit
types are anyext'd to S32 and types wider than 32 bits are split into
S32 parts by legalizeLaneOp. Add rules for B32 type.
Also enable -new-reg-bank-select in the permlane64 lit test and update
affected check lines.
This new flag depends both on the compiler version AND the linker (#188864)
version, and clang will say it supports the flag even if the linker
can't use its output.
The compiler actually has code to know whether the linker version is
right, and sets it to the default if the stars align.
So I'm going to just test whether whatever is the default method for the
compiler and linker works.
[lld-macho] Make safe ICF conservative without __llvm_addrsig (#188400)
MachO --icf=safe and --icf=safe_thunks used to keep folding code from
object files that did not contain __llvm_addrsig, which was inconsistent
with the conservative ELF/COFF behavior. Mark all symbols in such
objects as address-significant instead, and add regression coverage for
both safe ICF modes with and without addrsig.
Revert "[VPlan] Explicitly unroll replicate-regions without live-outs by VF." (#188868)
Reverts llvm/llvm-project#170212
appears to cause a failure with expensive checks:
https://lab.llvm.org/buildbot/#/builders/187/builds/18306
[AArch64][llvm] Separate TLBI-only feature gating from TLBIP aliases
Refactor the TLBI system operand definitions so that TLBI and TLBIP
records are emitted through separate helper multiclasses, whilst keeping
the table layout readable.
The feature-scoped wrappers now apply FeatureTLB_RMI, FeatureRME, and
FeatureTLBIW only to TLBI records (it was previously incorrectly also
applied to TLBIP instructions), while TLBIP aliases remain gated only
by FeatureD128, including their nXS forms.
Update testcases accordingly.
[AArch64][llvm] Rewrite the TLBI multiclass to be much clearer (NFC)
The `tlbi` multiclass is really doing four jobs at once: base TLBI,
synthesized nXS, optional TLBIP, and synthesized TLBIP nXS. Also,
`needsreg` and `optreg` are really just a 3-state operand policy in
disguise. Likewise, the PLBI multiclass has this same issue.
Change `needsreg` and `optreg` into a combined fake enum, so it's
clearer whether the instruction takes no register operand, a required
register operand or an optional register operand.
This improves on my original change 66e8270e8.
[UnsafeBufferUsage] Check for uninstantiated default arguments to prevent crash. (#188817)
Fix a crash introduced by
https://github.com/llvm/llvm-project/pull/184899
The -Wunsafe-buffer-usage analysis was crashing when it encountered a
template function with a default argument that hadn't been instantiated
yet. This occurred in populateStmtsForFindingGadgets when it attempted
to access the default argument of each parameter.
This fix adds a check to ensure the default argument is instantiated
before attempting to access it.
Assisted-by: Gemini
[lldb] Enable caching for BytecodeSyntheticChildren::FrontEnd::Update (#181199)
Update `BytecodeSyntheticChildren` to support `ChildCacheState` return
values from `@update` implementations.
[lldb][bytecode] Change compiler to require update return type decl (#188637)
To better ensure that bytecode `@update` implementations return a 0/1
value (see https://github.com/llvm/llvm-project/pull/181199), this
changes the Python -> formatter bytecode compiler to require that Python
`update` methods be declared to return `bool`.
A declaration like this will be a compiler error:
```py
def update(self):
# implementation...
```
[AMDGPU] Do not overlap dst with srcs for v_cvt_scalef32_2xpk16_fp6/bf6_f32 (#188809)
v_cvt_scalef32_2xpk16_fp6_f32 and v_cvt_scalef32_2xpk16_bf6_f32, as multipass instructions,
the destination operand must not overlap with any of the source operands.
In this work, we apply Constraints = "@earlyclobber $vdst" to these two instructions.
Fixes: LCCOMPILER-561
[VPlan] Explicitly unroll replicate-regions without live-outs by VF. (#170212)
This patch adds a new replicateReplicateRegionsByVF transform to
unroll replicate=regions by VF, dissolving them. The transform creates
VF copies of the replicate-region's content, connects them and converts
recipes to single-scalar variants for the corresponding lanes.
The initial version skips regions with live-outs (VPPredInstPHIRecipe),
which will be added in follow-up patches.
Depends on https://github.com/llvm/llvm-project/pull/170053
PR: https://github.com/llvm/llvm-project/pull/170212