[SandboxVec][VecUtils] Lane Enumerator (#188355)
This patch introduces an iterator that helps us iterate over lane-value
pairs in a range. For example, given a container `(i32 %v0, <2 x i32>
%v1, i32 %v2)` we get:
```
Lane Value
0 %v0
1 %v1
3 %v2
```
We use this iterator to replace the lane counting logic in
BottomUpVec.cpp.
[SPIRV][AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (#134016)
This change adds two builtins for AMDGPU:
- `__builtin_amdgcn_processor_is`, which is similar in observable
behaviour with `__builtin_cpu_is`, except that it is never "evaluated"
at run time;
- `__builtin_amdgcn_is_invocable`, which is behaviourally similar with
`__has_builtin`, except that it is not a macro (i.e. not evaluated at
preprocessing time).
Neither of these are `constexpr`, even though when compiling for
concrete (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated
in Clang, so they shouldn't tear the AST too badly / at all for
multi-pass compilation cases like HIP. They can only be used in specific
contexts (as args to control structures).
The motivation for adding these is two-fold:
[18 lines not shown]
[lldb][macOS] Recognize new layouts for DeviceSupport directories (#188646)
When debugging a remote Darwin device (iOS, macOS, etc), lldb needs to
find a local copy of all the system libraries (the system's shared
cache) so we don't need to read them over gdb-remote serial protocol at
the start of every debug session.
Xcode etc normally creates these expanded shared caches in
~/Library/Developer/Xcode/<OS> DeviceSupport/<OS VER> (<OS
BUILD>)/Symbols
So when lldb sees a file like /usr/lib/libSystem.B.dylib, it may find a
copy at in
~/L/D/Xcode/iOS DeviceSupport/26.2
(23B87)/Symbols/usr/lib/libSystem.B.dylib
There may be multiple expanded shared caches in these DeviceSupport
directories, so we try to parse the "os version" and "os build" out of
the filepath name, and look in a directory that matches the target
[23 lines not shown]
[DAG] isKnownNeverNaN - fallback to computeKnownFPClass check (#189476)
Remove ConstantFPSDNode handling from isKnownNeverNaN and fallback to
using computeKnownFPClass if there are no opcode matches in
isKnownNeverNaN
The test check changes are due to isKnownNeverNaN not handling
UNDEF/POISON but computeKnownFPClass does (POISON in particular now
returns isKnownNeverNaN == true, preventing a ISD::FCANONICALIZE call in
expandFMINNUM_FMAXNUM).
[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)
In LTO, part of LLVM's middle-end runs after linking has finished. LTO's
semantics depend on the complete set of extracted bitcode files being
known at this time. If the middle-end inserts new calls to library
functions (libfuncs) that are implemented in bitcode, this could extract
new bitcode object files into the link. These cannot be compiled,
leading to undefined symbol references.
Additionally, the middle-end in LTO may reason that such library
functions have no references, and it may internalize them, then
manipulate their API or even delete them. Afterwards, it may emit a call
to them, again producing undefined symbol references.
This patch resolves the former issue by ensuring that the middle end
emits no new references to symbols defined in bitcode, and it resolves
the latter issue by ensuring that extracted bitcode for libfuncs is
considered external, since new calls may be emitted to them at any time.
[8 lines not shown]
[AMDGPU][Scheduler] Use MIR-level rematerializer in rematerialization stage
This makes the scheduler's rematerialization stage use the
target-independent rematerializer. Previosuly duplicate logic is
deleted, and restrictions are put in place in the stage so that the
same cosntraints as before apply on rematerializable registers (as the
rematerializer is able to expose many more rematerialization
opportunities than what the stage can track at the moment).
Consequently it is not expected that this change improves performance
overall, but it is a first step toward being able to use the
rematerializer's more advanced capabilities during scheduling.
This is *not* a NFC for 2 reasons.
- Score equalities between two rematerialization candidates with
otherwise equivalent score are decided by their corresponding
register's index handle in the rematerializer (previously the pointer
to their state object's value). This is determined by the
rematerializer's register collection order, which is different from
[10 lines not shown]
[CIR] Implement handling of cleanups with active flag (#187389)
This implements handling of cleanup scopes in cases where a flag is
needed to indicate whether or not the cleanup is active. This happens in
cases where a cleanup is no longer required, but it isn't at the top of
the cleanup stack so it can't be popped. A temporary variable is used to
set the cleanup to an inactive state when it is no longer needed.
Assisted-by: Cursor / claude-4.6-opus-high (implementation)
Assisted-by: Cursor / gpt-5.3-codex (tests)
[MLIR][Affine] Add vector support to affine.linearize_index and affine.delinearize_index (#188369)
Allow `affine.delinearize_index` and `affine.linearize_index` to operate
on `vector<...x index>` types in addition to scalar indices.
---------
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha at gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU][Scheduler] Use MIR-level rematerializer in rematerialization stage
This makes the scheduler's rematerialization stage use the
target-independent rematerializer. Previosuly duplicate logic is
deleted, and restrictions are put in place in the stage so that the
same cosntraints as before apply on rematerializable registers (as the
rematerializer is able to expose many more rematerialization
opportunities than what the stage can track at the moment).
Consequently it is not expected that this change improves performance
overall, but it is a first step toward being able to use the
rematerializer's more advanced capabilities during scheduling.
This is *not* a NFC for 2 reasons.
- Score equalities between two rematerialization candidates with
otherwise equivalent score are decided by their corresponding
register's index handle in the rematerializer (previously the pointer
to their state object's value). This is determined by the
rematerializer's register collection order, which is different from
[10 lines not shown]
[mlir][amdgpu] implement amdgpu.global_load_async_to_lds for gfx1250 (#189279)
This patch introduces an amdgpu wrapper for
`rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in
gfx1250.
Assisted-by: Claude
---------
Signed-off-by: Eric Feng <Eric.Feng at amd.com>
[MLIR] [XeGPU] Add distribution patterns for vector transpose, bitcast & mask ops in sg to wi pass (#187392)
This PR adds patterns for following vector ops in the new sg-to-wi pass
1. Transpose
2. BitCast
3. CreateMask
4. ConstantMask
[AMDGPU][Scheduler] Prepare remat. stage for rematerializer integration (NFC)
This NFC prepares the scheduler's rematerialization stage for
integration with the target-independent rematerializer. It brings
various small design changes and optimizations to the stage's internal
state to make the not-exactly-NFC rematerializer integration as small as
possible.
The main changes are, in no particular order:
- Sort and pick useful rematerialization candidates by their index in
the vector of candidates instead of directly sorting objects within
the candidate vector. This reduces the amount of data movement and
simplifies the candidate selection logic.
- Move some data members from `PreRARematStage::RematReg` to
`PreRARematStage::ScoredRemat`. This makes the former a simplified
version of the rematerializer's own internal register representation
(`Rematerializer::Reg`), which can be cleanly deleted during
integration.
[8 lines not shown]
[NFC][CodeGen] Prepare for expansion of InlineAsmPrepare (#189469)
Move some functions around so that the CallBrInst processing is
contained. The 'static' functions don't need to be declared at the top;
just place them before the calls. Fix the naming to use lower-case for
the first letter of function names.
[CIR] Allow replacement of a structor declaration with an alias (#188320)
We had an errorNYI diagnostic to trigger when we generated an alias for
a ctor or dtor that had an existing declaration. Because functions are
used via flat symbol references, all that is needed is to erase the old
declaration. This change does that.
[CIR] Handle throwing calls inside EH cleanup (#188341)
This implements handling for throwing calls inside an EH cleanup
handler. When such a call occurs, the CFG flattening pass replaces it
with a cir.try_call op that unwinds to a terminate block.
A new CIR operation, cir.eh.terminate, is added to facilitate this
handling, and the design document is updated to describe the new
behavior.
Assisted-by: Cursor / claude-4.6-opus-high
[MLIR][Mem2Reg] Extract shared utilities for PromotableRegionOpInterface (#188514)
The `PromotableRegionOpInterface` implementations use two helpers that
are likely useful for other dialects implementing this interface as
well:
- `updateTerminator`: Appends the reaching definition as an operand to a
block's terminator, falling back to a default when the block has no
entry (e.g. dead code).
- `replaceWithNewResults`: Clones an operation with additional result
types while preserving its regions, then replaces the original.
This PR extracts them into a common utility header so that downstream
dialects can reuse them directly.
I'm open to discussion about the location of these utilities.
[SLP] Prefer to trim equal-cost alternate-shuffle subtrees
If the trimming candidate subtree is rooted at an alternate-shuffle node
with binary ops, and this subtree has the same cost as the buildvector
node cost, better to stick with the buildvector node to avoid runtime
perf regressions from shuffle/extra operations overhead that the cost model may
underestimate. Skip trimming if the subtree contains ExtractElement
nodes, since those operate on already-materialized vectors, which may
reduced vector-to-scalar code movement and have better perf.
Reviewers: hiraditya, bababuck, fhahn, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/188272
[DA] Require `nsw` for AddRecs in the WeakCrossing SIV test (#185041)
Before the start of the algorithm in weak crossing SIV test, we need to
ensure both addrecs are `nsw`