[VPlan] Fix masked_cond expansion.
masked_cond is used to combine early-exit conditions with masks from
predicate. The early-exit condition should only be evaluated if the mask
is true. Emit the mask first, to avoid incorrect poison propagation.
Fixes https://github.com/llvm/llvm-project/issues/187061.
[MLIR][Python] Add optional emit reset to exportSMTLIB (#187366)
Previously, the MLIR's python binding `smt.export_smtlib(...)` always
emit `(reset)` to the end of smtlib string as a solver terminator.
This PR added an option to suppress this trailing, as downstream users
like python z3 module don't need it.
[RISCV] Fix IDiv/IRem scheduling data for RV32 cores that use the SiFive7 model (#187331)
The integer division and remainder instructions on a 32-bit core that
uses SiFive7 scheduling model should have the same latency and
throughput as its word counterparts on a 64-bit SiFive7 core.
This patch fixes those scheduling entries by adding a new SchedPred that
predicates on `Feature64Bit` to toggle the SchedVariant that is attached
on the affected integer division / remainder instructions.
[AMDGPU] Regenerate codegen tests to check extra stuff at end of line (#187325)
Regenerate checks after two recent commits that caused extra stuff to be
added at the end of assembly lines, so the existing checks did not fail.
- #179414 added "nv" to loads and stores on GFX1250.
- #185774 added "msbs" comments on setreg instructions.
[LSR] skip ephemeral IV users when collecting IV chains (#187282)
IVUsers records ephemeral values used only by `llvm.assume` as IV
operands in the Processed set. As a result, `CollectChains` picks them
up and builds unnecessary increment chains. Fix this by checking
`IVUsers::isEphemeral` before collecting the chains.
Fixes #187270
[CIR][NFC] Remove NYI checks in ternary with cleanup (#186870)
We added those checks when CleanupScopeOp is used to emit an error
message in this edge case until we fix it. Now it's already fixed, and
we don't need to keep the NYI
[AMDGPU] Updated getMaxNumAGPRs to use getMaxNumVectorRegs.
Removed other variants of getMaxNumAGPRs. So with this patch, there
is only one way to get the maximum number of AGPRs. If the client
provides a target occupancy, that value will be used. Otherwise,
the function level attributes for waves-per-eu are used. In both the
cases, the utility uses getMaxNumVectorRegs.
[AMDGPU] Adds AGPR pressure during candidate init in GCN scheduler.
Scheduling heuristics automatically will consider AGPR pressure.
AGPRExcessLimit and AGPRCriticalLimit are added. Some of the VGPR
bias and error limits are reused. Helpers added mostly mirror the
existing VGPR logic. A ConsiderAGPR boolean controls whether AGPRs
should at all be factored in during candidate initialization, e.g.
on targets with allocatable AGPRs.
Verified that updated LIT tests use AGPRs.
Originally Authored-by: Nicholas Baron
(https://github.com/llvm/llvm-project/pull/150288)
Modified-by: Dhruva Chakrabarti
Assisted-by: Cursor
[AsmPrinter][MTE] Support memtag-globals for all AArch64 targets (#187065)
This change ensures that all AArch64 targets can use memtag globals, not
only Android.
[AArch64] Fix register scavenger crash when merging MTE stack tags (#186934)
When `-sanitize=memtag-stack` is enabled, `TagStoreEdit::emitLoop`
optimizes contiguous ST2Gi instructions into an STGloop. Because this
runs during PEI (post-register allocation), it spawns two new virtual
registers: BaseReg and SizeReg.
Under high register pressure (e.g., Swift async continuation thunks
where almost all registers are kept live), the Register Scavenger must
rely on emergency spill slots to assign physical registers to BaseReg
and SizeReg.
Previously, the compiler assumed at most one emergency spill slot was
needed. If PEI found an unused Callee-Saved Register (`ExtraCSSpill`),
it bypassed allocating an emergency slot entirely. If no CSRs were free,
it allocated exactly one slot. Because STGloop requires TWO scratch
locations, the scavenger would crash trying to fulfill the second
allocation.
[11 lines not shown]
[MCP] Never eliminate frame-setup/destroy instructions
Presumably targets only insert frame instructions which are significant,
and there may be effects MCP doesn't model. Similar to reserved registers this
is probably overly conservative, but as this causes no codegen change in
any lit test I think it is benign.
The motivation is just to clean up #183149 for AMDGPU, as we can spill
to physical registers, and currently have to spill the EXEC mask purely
to enable debug-info.
Change-Id: I9ea4a09b34464c43322edd2900361bf635efd9f7
[MCP][NFC] Opinionated refactoring
There are a few minor inconsistencies across the pass which I found mildly
distracting:
* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
the `optional`.
Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.
Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.
[11 lines not shown]