[SelectionDAG] Fix CLMULR/CLMULH expansion (#183537)
For v8i8 on AArch64, `expandCLMUL` picked the zext path (ExtVT=v8i16) since ZERO_EXTEND/SRL were legal, but CLMUL on v8i16 is not, resulting in a bit-by-bit expansion (~42 insns). Prefer the bitreverse path when CLMUL is legal on VT but not ExtVT.
v8i8 CLMULR: 42 → 4 instructions.
Fixes #182780
[MLIR][Vector] Enhance shape_cast unrolling support in case the target shape is [1, 1, ..1] (#183436)
This PR fixes a minor issue in shape_cast unrolling: when all target
dimensions are unit-sized, it no longer removes all leading unit
dimensions.
[MIR] Error on signed integer in getUnsigned (#183171)
Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.
Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
[AMDGPU][Scheduler] Add `GCNRegPressure`-based methods to `GCNRPTarget` (#182853)
This adds a few methods to `GCNRPTarget` that can estimate/perform RP
savings based on `GCNRegPressure` instead of a single `Register`,
opening the door to model/incorporate more complex savings made up of
multiple registers of potentially different classes. The scheduler's
rematerialization stage now uses this new API.
Although there are no test changes this is not really NFC since register
pressure savings in the rematerialization stage are now computed through
`GCNRegPressure` instead of the stage itself. If anything this makes
them more consistent with the rest of the RP-tracking infrastructure.
[LLVM][Runtimes] Add 'llvm-gpu-loader' to dependency list (#183601)
Summary:
This is used to run the unit tests in libc / libc++. It must exist in
the build directory's binary path, but without this dependnecy we may
not build it before running the runtimes build. This should ensure that
it's present, and only if we have tests enabled.
[X86] Fold XOR of two vgf2p8affineqb instructions with same input (#179900)
This patch implements an optimization to fold XOR of two
`vgf2p8affineqb` instructions operating on the same input.
This optimization:
Reduces instruction count from 3 to 2
Eliminates one vgf2p8affineqb instruction
Added combineXorWithTwoGF2P8AFFINEQB function in X86ISelLowering.cpp
Uses sd_match pattern matching for consistency with existing code
Checks that both operations have single use to avoid code bloat
Verifies both operations use the same input
Handles commutative XOR patterns automatically
[analyzer] Fix crash in MallocChecker when a function has both ownership_returns and ownership_takes (#183583)
When a function was annotated with both `ownership_returns` and
`ownership_takes` (or `ownership_holds`), MallocChecker::evalCall would
fall into the freeing-only branch (isFreeingOwnershipAttrCall) and call
checkOwnershipAttr without first calling MallocBindRetVal. That meant no
heap symbol had been conjured for the return value, so
checkOwnershipAttr later dereferenced a null/invalid symbol and crashed.
Fix: merge the two dispatch branches so that MallocBindRetVal is always
called first whenever ownership_returns is present, regardless of
whether the function also carries ownership_takes/ownership_holds.
The crash was introduced in #106081
339282d49f5310a2837da45c0ccc19da15675554.
Released in clang-20, and crashing ever since.
Fixes #183344.
Assisted-By: claude
[LLVM][ExecutionEngine] Add vector ConstantInt/FP support to getConstantValue(). (#182538)
Unify vector constant handling via calls to getAggregateElement rather
than handling each constant type separately.
[VPlan] Add nuw to unrolled canonical IVs (#183716)
After #183080, the canonical IV (not the increment!) can't overflow. So
now canonical IVs that are unrolled will have steps that don't overflow,
so we can add the nuw flag.
This allows us to tighten the VPlanVerifier isKnownMonotonic check by
restricting it to adds with nuw.
[Clang] support C23 constexpr struct member access in constant expressions (#182770)
Fixes #178349
---
This patch resolves an issue where accessing C23 `constexpr` struct
members using the dot operator was not recognized as a constant
expression.
According to C23 spec:
> 6.6p7:
>
> An identifier that is:
> — an enumeration constant,
> — a predefined constant, or
> — declared with storage-class specifier constexpr and has an object
type,
[20 lines not shown]
[LangRef] Clarify in vscale_range that vscale is a power-of-two without the attribute (#183689)
Previously vscale_range used to add the constraint that vscale is a
power-of-two, but after #183080 it's already a power-of-two to begin
with.
This clarifies the sentence about assumptions when there is no attribute
[mlir][dataflow] Fix crash in IntegerRangeAnalysis with non-constant loop bounds (#183660)
When visiting non-control-flow arguments of a LoopLikeOpInterface op,
IntegerRangeAnalysis assumed that getLoopLowerBounds(),
getLoopUpperBounds(), and getLoopSteps() always return non-null values
when getLoopInductionVars() is non-null. This assumption is incorrect:
for example, AffineForOp returns nullopt from getLoopUpperBounds() when
the upper bound is not a constant affine expression (e.g., a dynamic
index from a tensor.dim).
Fix this by checking whether the bound optionals are engaged before
dereferencing them and falling back to the generic analysis if any bound
is unavailable.
Fixes #180312
[mlir][affine] Fix crash in linearize_index fold when basis is ub.poison (#183650)
`foldCstValueToCstAttrBasis` iterates the folded dynamic basis values
and erases any operand whose folded attribute is non-null (i.e., was
constant- folded). When an operand folds to `ub.PoisonAttr`, the
attribute is non-null so the operand was erased from the dynamic operand
list. However, `getConstantIntValue` on the corresponding `OpFoldResult`
in `mixedBasis` returns `std::nullopt` for poison (it is not an integer
constant), so the position was left as `ShapedType::kDynamic` in the
returned static basis.
This left the op in an inconsistent state: the static basis claimed one
more dynamic entry than actually existed. A subsequent call to
`getMixedBasis()` triggered the assertion inside `getMixedValues`.
Fix by skipping poison attributes in the erasure loop, treating them
like non-constant values. This keeps the dynamic operand and its
matching `kDynamic` entry in the static basis consistent.
Fixes #179265