[mlir][dataflow] Fix crash in IntegerRangeAnalysis with non-constant loop bounds (#183660)
When visiting non-control-flow arguments of a LoopLikeOpInterface op,
IntegerRangeAnalysis assumed that getLoopLowerBounds(),
getLoopUpperBounds(), and getLoopSteps() always return non-null values
when getLoopInductionVars() is non-null. This assumption is incorrect:
for example, AffineForOp returns nullopt from getLoopUpperBounds() when
the upper bound is not a constant affine expression (e.g., a dynamic
index from a tensor.dim).
Fix this by checking whether the bound optionals are engaged before
dereferencing them and falling back to the generic analysis if any bound
is unavailable.
Fixes #180312
[mlir][affine] Fix crash in linearize_index fold when basis is ub.poison (#183650)
`foldCstValueToCstAttrBasis` iterates the folded dynamic basis values
and erases any operand whose folded attribute is non-null (i.e., was
constant- folded). When an operand folds to `ub.PoisonAttr`, the
attribute is non-null so the operand was erased from the dynamic operand
list. However, `getConstantIntValue` on the corresponding `OpFoldResult`
in `mixedBasis` returns `std::nullopt` for poison (it is not an integer
constant), so the position was left as `ShapedType::kDynamic` in the
returned static basis.
This left the op in an inconsistent state: the static basis claimed one
more dynamic entry than actually existed. A subsequent call to
`getMixedBasis()` triggered the assertion inside `getMixedValues`.
Fix by skipping poison attributes in the erasure loop, treating them
like non-constant values. This keeps the dynamic operand and its
matching `kDynamic` entry in the static basis consistent.
Fixes #179265
[SCEV] Always return true for isKnownToBeAPowerOfTwo for SCEVVScale (#183693)
After #183080 vscale is always a power of two, so we don't need to check
for the vscale_range attribute.
[AMDGPU] Remove unused CmpLGOp instruction (#180195)
The instruction was accidentally added, remove it.
Rename OrN2Op to OrN2Opc for consistency with other names
[MemorySSA] Make `getBlockDefs` and `getBlockAccesses` return a non-const list (NFC)
As per discussion at https://github.com/llvm/llvm-project/pull/181709#discussion_r2847595945,
users may already get a non-const MemoryAccess pointer via
`getMemoryAccess` for a given instruction. Drop the restriction
on directly iterate over them by modifying public `getBlockDefs`/
`getBlockAccesses` APIs to return a mutable list, thus dropping the
now obsolete distinction with `getWritableBlockDefs` and
`getWritableBlockAccesses` helpers.
[lldb][test] Re-enable TestDyldLaunchLinux.py for Linux/Arm (#181221)
The test was disabled in c55e021d, but it now passes, with both remote
and local runs.
[AMDGPU] Support i8/i16 GEP indices when promoting allocas to vectors (#175489)
Allow promote alloca to vector to form a vector element index from
i8/i16
GEPs when the dynamic offset is known to be element size aligned.
Example:
```llvm
%alloca = alloca <3 x float>, addrspace(5)
%idx = select i1 %idx_select, i32 0, i32 4
%p = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %idx
```
Or:
```llvm
%alloca = alloca <3 x float>, addrspace(5)
%idx = select i1 %idx_select, i32 0, i32 2
%p = getelementptr inbounds i16, ptr addrspace(5) %alloca, i32 %idx
```
[libc++] Fix vector::append_range growing before the capacity is reached (#183264)
Currently `vector::append_range` grows even when appending a number of
elements that is exactly equal to its spare capacity, which is
guaranteed by the standard to _not_ happen.
Fixes #183256
[InstCombine] Combine extract from get_active_lane_mask where all lanes inactive (#183329)
When extracting a subvector from the result of a get_active_lane_mask, return
a constant zero vector if it can be proven that all lanes will be inactive.
For example, the result of the extract below will be a subvector where
every lane is inactive if X & Y are const, and `Y * VScale >= X`:
vector.extract(get.active.lane.mask(Start, X), Y)
[AArch64] optimize vselect of bitcast (#180375)
Using code/ideas from the x86 backend to optimize a select on a bitcast
integer. The previous aarch64 approach was to individually extract the
bits from the mask, which is kind of terrible.
https://rust.godbolt.org/z/576sndT66
```llvm
define void @if_then_else8(ptr %out, i8 %mask, ptr %if_true, ptr %if_false) {
start:
%t = load <8 x i32>, ptr %if_true, align 4
%f = load <8 x i32>, ptr %if_false, align 4
%m = bitcast i8 %mask to <8 x i1>
%s = select <8 x i1> %m, <8 x i32> %t, <8 x i32> %f
store <8 x i32> %s, ptr %out, align 4
ret void
}
```
[64 lines not shown]
[AArch64] Add vector expansion support for ISD::FPOW when using ArmPL (#183526)
This patch is split off from PR #183319 and teaches the backend how to
lower the FPOW DAG node to the vector math library function when using
ArmPL. This is similar to what we already do for llvm.sincos/FSINCOS
today.
[NFC][analyzer] Remove NodeBuilders: part I (#183354)
This commit simplifies some parts of the engine by replacing short-lived
`NodeBuilder`s with `CoreEngine::makeNode`.
Additionally, the three-argument overload of `CoreEngine::enqueue` is
renamed to `enqueueStmtNodes` to highlight that it just calls
`enqueueStmtNode` in a loop.
[WebAssembly][FastISel] Emit signed loads for sext of i8/i16/i32 (#182767)
FastISel currently defaults to unsigned loads for i8/i16/i32 types,
leaving any sign-extension to be handled by a separate instruction. This
patch optimizes this by folding the SExtInst into the LoadInst, directly
emitting a signed load (e.g., i32.load8_s).
When a load has a single SExtInst use, selectLoad emits a signed load
and safely removes the redundantly emitted SExtInst.
Fixed: #180783
Revert "[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)" (#183698)
This reverts commit b0b3e3e1c7f6387eabc2ef9ff1fea311e63a4299.
After thinking about this for a bit, I don't think this is correct.
vscale being a power-of-2 only guarantees the canonical IV increment
overflows to zero, but not overflows in general.
[AMDGPU] Multi dword spilling for unaligned tuples
While spilling unaligned tuples, rather than breaking the
spill into 32-bit accesses, spill the first register as a
single 32-bit spill, and spill the remainder of the tuple
as an aligned tuple.
Some additional bookkeeping is required in the spilling
loop to manage the state.
[clang][bytecode][NFC] Refactor visitDeclRef() (#183690)
Move the `!VD` case up so we can assume `VD` to be non-null earlier and
use a local variable instead of calling `D->getType()` several times.
[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)
The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).
I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
[AMX][NFC] Match pseudo name with isa (#182235)
Adds missing suffix to clear intent for isa.
we switch from `TILEMOVROWrre` to `TILEMOVROWrte` in
https://github.com/llvm/llvm-project/pull/168193 , however pseudo was
same, updating pseudo to intent right isa version, This patch makes
changes `PTILEMOVROWrre` to `PTILEMOVROWrte`, even though pseudo does
not actually have any tile register.
---------
Co-authored-by: mattarde <mattarde at intel.com>
[Clang][NFCI] Make program state GDM key const pointer (#183477)
This commit makes the GDM key in ProgramState a constant pointer. This
is done to better reflect the intention of the key as a unique
identifier for the data stored in the GDM, and to prevent the use of the
storage pointed to by the key as global state.
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
Lower strictfp vector rounding operations similar to default mode
Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575)
Unless we're working with AVX512 mask predicate types, sign extending a
vXi1 comparison result back to the width of the comparison source types
is free.
VectorCombine::foldShuffleOfCastops - pass the original CastInst in the
getCastInstrCost calls to track the source comparison instruction.
Fixes #165813