[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle legalization (#188691)
This is a second attempt at "[SelectionDAG] Expand
CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220)
That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2
because we had crashes on AMDGPU since we didn't have scalarization
support, and other crashes on PowerPC because we didn't handle the case
when a vector needed widened. Tests for these are added in
AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and
PowerPC/cttz-elts.ll.
The former crash has been fixed by adding
DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS.
The second crash has been fixed by reworking
TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly
identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step
vector and subtracts the result from VF. The easiest way to fix these
[6 lines not shown]
AMDGPU: Match fract from compare and select and minimum
Implementing this with any of the minnum variants is overconstraining
for the actual use. Existing patterns use fmin, then have to manually
clamp nan inputs to get nan propagating behavior. It's cleaner to express
this with a nan propagating operation to start with.
AMDGPU: Match fract pattern with swapped edge case check (#189081)
A fract implementation can equivalently be written as
r = fmin(x - floor(x))
r = isnan(x) ? x : r;
r = isinf(x) ? 0.0 : r;
or:
r = fmin(x - floor(x));
r = isinf(x) ? 0.0 : r;
r = isnan(x) ? x : r;
Previously this only matched the previous form. Match
the case where the isinf check is the inner clamp. There are
a few more ways to write this pattern (e.g., move the clamp of
infinity to the input) but I haven't encountered that in the wild.
The existing code seems to be trying too hard to match noncanonical
variants of the pattern. Only handles the result that all 4 permutations
of compare and select produce out of instcombine.
[libc++][AIX] Fix force_thread_creation_failure by using RLIMIT_THREADS (#188787)
This patch fixes the test `force_thread_creation_failure.cpp` on AIX by
using platform specific `RLIMIT_THREADS` which helps in restricting the
thread creation as `RLIMIT_NPROC` on AIX restricts processes and not
threads.
---------
Co-authored-by: himadhith <himadhith.v at ibm.com>
[RISCV] Fix discarded return value in RISCVOperand::print for FRM (#189530)
The roundingModeToString() return value was not being written to the
output stream, causing FRM operands to print as "<frm: >" with no
rounding mode name in debug output.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[Hexagon] Fix infinite loop in scheduler for RELOC_NONE instruction (#188690)
The llvm.reloc.none intrinsic (introduced in 5f08fb4d72f6) causes an
infinite loop when compiling for Hexagon target. The Hexagon scheduler's
hazard recognizer enters an infinite loop because RELOC_NONE
RELOC_NONE is a pseudo-instruction that doesn't correspond to real
hardware, but the Hexagon hazard recognizer was treating it as a regular
instruction requiring hardware resource allocation.
Mark RELOC_NONE as a meta-instruction and update Hexagon's hazard
recognizer to skip resource checks for meta-instructions, similar to how
it handles zero-cost instructions.
[libc++] Remove non-conforming `__bit_reference::operator&` (#188714)
The overloaded `operator&` caused non-conforming behavior when
- using `operator==` to compare "addresses" of proxy reference objects,
and
- relying on the exact type of `&ref`.
No deprecation warning is added, becaue it should be portable to write
`&ref` where `ref` is a proxy reference variable, and this patch just
corrects the behavior.
`__bit_const_reference::operator&` is kept, because when one defines
`_LIBCPP_ABI_BITSET_VECTOR_BOOL_CONST_SUBSCRIPT_RETURN_BOOL` to make the
libc++ implementation strategy conforming, the `operator&` will never be
exposed to users.
[DA] Check nsw flags for addrecs in the Exact RDIV test
This patch adds a check to ensure that the addrecs have nsw flags at the
beginning of the Exact SIV test. If either of them doesn't have, the
analysis bails out. This check is necessary because the subsequent
process in the Exact SIV test assumes that they don't wrap.