[CodeGen] Simplify ExpandPostRA::LowerSubregToReg. NFC. (#179634)
SUBREG_TO_REG always has a non-zero subreg index so DstSubReg can never
be the same as DstReg.
[AArch64][llvm] Remove `+xs` gating for `tlbip *nxs` instructions
A recent specification update has removed FEAT_XS gating for `tlbip *nxs`
instructions. It remains gated on FEAT_XS for `tlbi *nxs` instructions.
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```
[AArch64][llvm] Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions
Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions.
We removed gating for `sys`, `mrs` and `mrs` instructions previously,
on the basis that it doesn't add value, as it doesn't indicate that
any particular system registers or system instructions are available.
Therefore, remove `+d128` gating for these too.
(In an upcoming change, some `tlbip` instructions, which are `sysp` aliases
are allowed to be used with either `+d128` or `tlbid`. If we don't remove
this gating, then it would require some ugly work-arounds in the code to
support the relaxation mandated by the 2025 MemSys specification.
In this change, retain `+d128` gating for all `tlbip` instructions, which
will then be loosened to either `+d128` or `+tlbid` in a subsequent change)
[X86] Fold EXPAND(X,Y,M) -> SELECT(M,X,Y) when M is a lowest bit mask (#179630)
If a EXPAND node mask is just the lowest bits, then we can replace it
with a more general SELECT node, which can be cheaper and potentially
allow predication.
Fixes #179008
[NFC][LLVM] Make `MachineInstrBuilder::constrainAllUses` return `void` (#179632)
This function always returns `true`; so we can transform it to return
`void` and simplify the code.
Follow up of https://github.com/llvm/llvm-project/pull/179501 .
[AMDGPU][SIRegisterInfo] Fix maxoffset calculation in buildSpillLoadStore (#179182)
This PR addresses Maxoffset calculation bug in SIRegisterInfo. When
RemSize is non-zero, maxoffset, that needs to be encoded in the offset
field, will be equal to "Offset + Size".
---------
Co-authored-by: Abhinav Garg <abhigarg at amd.com>
[AMDGPU] Add CmpLG and OrN2 operators to LaneMaskConstants (#179493)
Add CmpLG and OrN2 operators to be able to use the LaneMaskConstants in
PhiLoweringHelper from SILowerI1Copies
[X86] Fold vgf2p8affineqb XOR with splat constant into immediate (#179103)
The vgf2p8affineqb instruction performs an affine transformation on each
byte and then XORs the result with an 8-bit immediate operand. When this
instruction is followed by a standalone XOR with a splatted constant,
LLVM currently generates extra instructions instead of folding the
constant into the instruction's immediate.
This PR adds a DAG combine optimization that detects the pattern
vgf2p8affineqb(x, m, imm8) ^ C where C is a splatted 8-bit constant and
transforms it to vgf2p8affineqb(x, m, imm8 ^ C), eliminating the
unnecessary XOR instruction.
- The optimization runs during the combine phase after type legalization
- Handles XOR with the constant on either side (commutative)
- Only applies when the GFNI instruction has a single use to avoid
de-optimization
- Validates that the XOR operand is a splatted 8-bit constant before
folding
- Includes test coverage for positive cases and negative cases
(multi-use, non-splat constant, variable XOR)
[NFC][LLVM] Make `MachineInstrBuilder::constrainAllUses` return `void`
This function always returns `true`; so we can transform it to return
`void` and simplify the code.
Follow up of https://github.com/llvm/llvm-project/pull/179501 .
[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)
When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.
Fixes: https://github.com/llvm/llvm-project/issues/177155
(cherry picked from commit 38e280d8a405bb442d176b8dab18da63d3fc2810)
[NFC][LLVM] Make `MachineInstrBuilder::constrainAllUses` return `void`
This function always returns `true`; so we can transform it to return
`void` and simplify the code.
Follow up of https://github.com/llvm/llvm-project/pull/179501 .
[libc] Tweak the runtimes cross-build for GPU (#178548)
Summary:
We should likely use `-DLLVM_DEFAULT_TARGET_TRIPLE` as the general
source of truth, make the handling work with that since we use it for
the output directories. Fix the creation of startup files in this mode
and make sure it can detect the GPU properly.
Fixes: https://github.com/llvm/llvm-project/issues/179375
(cherry picked from commit e07a1182fd58a5b48a2c78bc3ae03872186d4ae0)
[libc++] Simplify the implementation of __{un,re}wrap_range (#178381)
We can use a relatively simple `if constexpr` chain instead of SFINAE
and class template specialization, making the functions much simpler to
understand.
[AArch64][SME] Limit where SME ABI optimizations apply (#179273)
These were added recently with a fairly complex propagation step,
however, these optimizations can cause regressions in some cases.
This patch limits the cross-block optimizations to the simple case
picking a state that matches all incoming blocks. If any block doesn't
match, we fallback to using "ACTIVE", the default state.