[MLIR][XeGPU][VectorToXeGPU] Fixed lowering of transfer_read/write for rank > 2 (#193308)
If rank > 2, load gather/store scatter are used.
Increased value type rank to 8.
[AMDGPU][NFC] Refactor TryGetMCExprValue into evaluateMCExprs helper (#193859)
Replace the duplicated `TryGetMCExprValue` lambda in
`evaluateExtraSGPRs`, `evaluateTotalNumVGPR`, `evaluateAlignTo`, and
`evaluateOccupancy` with a shared static helper `evaluateMCExprs` that
takes an `initializer_list` of `uint64_t` references, enabling callers
to write:
```cpp
uint64_t VCCUsed, FlatScrUsed, XNACKUsed;
if (!evaluateMCExprs(Args, Asm, {VCCUsed, FlatScrUsed, XNACKUsed}))
return false;
```
Split out from #192306 per reviewer feedback.
This PR was created with the help of Github Copilot Claude Opus.
---------
Co-authored-by: Copilot <copilot at github.com>
[AMDGPU] Enabled GCN trackers (amdgpu-use-amdgpu-trackers) by default.
The LIT tests have been generally updated in one of the following ways:
(1) If the above option was not present and the test was auto-generated,
the test has now been auto-generated.
(2) If the above option was not present and the test was not
auto-generated, added the option -amdgpu-use-amdgpu-trackers=0 so as to
preserve any specific attributes the test was already checking.
(3) If the above option was present in a test, then its value has been
updated to reflect the change in the default.
Currently, there are 4 tests in category (2). They are:
CodeGen/AMDGPU/
addrspacecast.ll
schedule-regpressure-limit.ll
schedule-regpressure-limit2.ll
sema-v-unsched-bundle.ll
There are 8 tests in category (3). They are:
[15 lines not shown]
[DirectX] Denote `dx.resource.getpointer` with `IntrInaccessibleMemOnly` and `IntrReadMem` (#193593)
`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.
Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.
Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.
This was discovered when
https://github.com/llvm/llvm-project/pull/188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
[5 lines not shown]
[SLP] Skip FMulAdd conversion for alt-shuffle FAdd/FSub nodes (#193960)
isAddSubLikeOp() admits alt-shuffle nodes that mix FAdd and FSub, so
transformNodes() was marking them with CombinedOp = FMulAdd. The cost
model then priced the node as a single llvm.fmuladd vector intrinsic,
but emission for an alt shuffle still goes through the ShuffleVector
path and produces fmul + fadd + fsub + shufflevector, which the backend
cannot fuse into a single fmuladd. The resulting under-count made SLP
choose the vector form over the scalar form even when the scalar form
lowers to real FMAs (e.g. fmadd + fmsub on AArch64).