AMDGPU/GlobalISel: Regbanklegalize rules for INTRIN_IMAGE
Regbanklegalize rules for INTRIN_IMAGE loads and stores.
Because of very large number of different type signatures, rule specifies
only function for lowering (waterfall lowering of RsrcIdx operand if needed)
and this function also applies register banks.
[llvm-reduce] Add a pass to replace unconditional branches with returns (#180993)
Unconditional branches could end up in infinite loops in the reduced
code, while the code could have been reduce further.
This patch implements a simple pass that replaces unconditional branches
with returns.
[AArch64] Improve post-inc stores of SIMD/FP values
Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32).
This avoids transferring the value through a GPR when storing.
Also remove the pre-legalization early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.
[Tablegen] Patch RegUnitIntervals Initialization (#181173)
There were a few places it was missing some code-generation to properly
initialize it if enabled, and also it was missing the sentinel value.
[LoopInterchange] Update UTC version (NFC) (#181988)
This is a follow-up PR to #181804. While working on the stacked PRs, I
encountered some noisy diffs in the CHECK lines that don't change the
meaning of the tests. To avoid such changes and make the review easier,
this patch updates the UTC version. It also renames some BBs to suppress
warnings emitted by UTC.
[AMDGPU] Fix opcode comparison logic for G_INTRINSIC (#156008)
The check `(Opc < TargetOpcode::GENERIC_OP_END)` incorrectly
includes `G_INTRINSIC` (129), which is less than
`GENERIC_OP_END` (313), leading to logically dead code.
This patch reorders the conditionals to first check for `G_INTRINSIC`,
ensuring
correct handling of the `amdgcn_fdot2` intrinsic.