[CIR] Implement handling of cleanups with active flag (#187389)
This implements handling of cleanup scopes in cases where a flag is
needed to indicate whether or not the cleanup is active. This happens in
cases where a cleanup is no longer required, but it isn't at the top of
the cleanup stack so it can't be popped. A temporary variable is used to
set the cleanup to an inactive state when it is no longer needed.
Assisted-by: Cursor / claude-4.6-opus-high (implementation)
Assisted-by: Cursor / gpt-5.3-codex (tests)
[MLIR][Affine] Add vector support to affine.linearize_index and affine.delinearize_index (#188369)
Allow `affine.delinearize_index` and `affine.linearize_index` to operate
on `vector<...x index>` types in addition to scalar indices.
---------
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha at gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU][Scheduler] Use MIR-level rematerializer in rematerialization stage
This makes the scheduler's rematerialization stage use the
target-independent rematerializer. Previosuly duplicate logic is
deleted, and restrictions are put in place in the stage so that the
same cosntraints as before apply on rematerializable registers (as the
rematerializer is able to expose many more rematerialization
opportunities than what the stage can track at the moment).
Consequently it is not expected that this change improves performance
overall, but it is a first step toward being able to use the
rematerializer's more advanced capabilities during scheduling.
This is *not* a NFC for 2 reasons.
- Score equalities between two rematerialization candidates with
otherwise equivalent score are decided by their corresponding
register's index handle in the rematerializer (previously the pointer
to their state object's value). This is determined by the
rematerializer's register collection order, which is different from
[10 lines not shown]
[mlir][amdgpu] implement amdgpu.global_load_async_to_lds for gfx1250 (#189279)
This patch introduces an amdgpu wrapper for
`rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in
gfx1250.
Assisted-by: Claude
---------
Signed-off-by: Eric Feng <Eric.Feng at amd.com>
Track and propagate STANDBY ALUA state explicitly
Drive the STANDBY target group through OFFLINE -> TRANSITIONING ->
NONOPTIMIZED across the failover and standby_after_start lifecycle,
updating both nodes at each transition so RTPG responses are accurate
on whichever node an initiator queries. On reload, the current state
is consulted rather than hardcoding nonoptimized.
[MLIR] [XeGPU] Add distribution patterns for vector transpose, bitcast & mask ops in sg to wi pass (#187392)
This PR adds patterns for following vector ops in the new sg-to-wi pass
1. Transpose
2. BitCast
3. CreateMask
4. ConstantMask
[AMDGPU][Scheduler] Prepare remat. stage for rematerializer integration (NFC)
This NFC prepares the scheduler's rematerialization stage for
integration with the target-independent rematerializer. It brings
various small design changes and optimizations to the stage's internal
state to make the not-exactly-NFC rematerializer integration as small as
possible.
The main changes are, in no particular order:
- Sort and pick useful rematerialization candidates by their index in
the vector of candidates instead of directly sorting objects within
the candidate vector. This reduces the amount of data movement and
simplifies the candidate selection logic.
- Move some data members from `PreRARematStage::RematReg` to
`PreRARematStage::ScoredRemat`. This makes the former a simplified
version of the rematerializer's own internal register representation
(`Rematerializer::Reg`), which can be cleanly deleted during
integration.
[8 lines not shown]
[NFC][CodeGen] Prepare for expansion of InlineAsmPrepare (#189469)
Move some functions around so that the CallBrInst processing is
contained. The 'static' functions don't need to be declared at the top;
just place them before the calls. Fix the naming to use lower-case for
the first letter of function names.
[CIR] Allow replacement of a structor declaration with an alias (#188320)
We had an errorNYI diagnostic to trigger when we generated an alias for
a ctor or dtor that had an existing declaration. Because functions are
used via flat symbol references, all that is needed is to erase the old
declaration. This change does that.
[CIR] Handle throwing calls inside EH cleanup (#188341)
This implements handling for throwing calls inside an EH cleanup
handler. When such a call occurs, the CFG flattening pass replaces it
with a cir.try_call op that unwinds to a terminate block.
A new CIR operation, cir.eh.terminate, is added to facilitate this
handling, and the design document is updated to describe the new
behavior.
Assisted-by: Cursor / claude-4.6-opus-high