LLVM/project 14f7334mlir/lib/Analysis/DataFlow IntegerRangeAnalysis.cpp, mlir/test/Dialect/Arith int-range-narrowing.mlir

[mlir][dataflow] Fix crash in IntegerRangeAnalysis with non-constant loop bounds (#183660)

When visiting non-control-flow arguments of a LoopLikeOpInterface op,
IntegerRangeAnalysis assumed that getLoopLowerBounds(),
getLoopUpperBounds(), and getLoopSteps() always return non-null values
when getLoopInductionVars() is non-null. This assumption is incorrect:
for example, AffineForOp returns nullopt from getLoopUpperBounds() when
the upper bound is not a constant affine expression (e.g., a dynamic
index from a tensor.dim).

Fix this by checking whether the bound optionals are engaged before
dereferencing them and falling back to the generic analysis if any bound
is unavailable.

Fixes #180312
DeltaFile
+15-4mlir/lib/Analysis/DataFlow/IntegerRangeAnalysis.cpp
+14-0mlir/test/Dialect/Arith/int-range-narrowing.mlir
+29-42 files

LLVM/project c5c0fe6llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

[VPlan] Remove non-power-of-2 scalable VF comment. NFC (#183719)

No longer holds after #183080
DeltaFile
+0-2llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+0-21 files

LLVM/project 9882590mlir/lib/Dialect/Affine/IR AffineOps.cpp, mlir/test/Dialect/Affine canonicalize.mlir

[mlir][affine] Fix crash in linearize_index fold when basis is ub.poison (#183650)

`foldCstValueToCstAttrBasis` iterates the folded dynamic basis values
and erases any operand whose folded attribute is non-null (i.e., was
constant- folded). When an operand folds to `ub.PoisonAttr`, the
attribute is non-null so the operand was erased from the dynamic operand
list. However, `getConstantIntValue` on the corresponding `OpFoldResult`
in `mixedBasis` returns `std::nullopt` for poison (it is not an integer
constant), so the position was left as `ShapedType::kDynamic` in the
returned static basis.

This left the op in an inconsistent state: the static basis claimed one
more dynamic entry than actually existed. A subsequent call to
`getMixedBasis()` triggered the assertion inside `getMixedValues`.

Fix by skipping poison attributes in the erasure loop, treating them
like non-constant values. This keeps the dynamic operand and its
matching `kDynamic` entry in the static basis consistent.

Fixes #179265
DeltaFile
+14-0mlir/test/Dialect/Affine/canonicalize.mlir
+6-2mlir/lib/Dialect/Affine/IR/AffineOps.cpp
+20-22 files

LLVM/project e7bc02dllvm/lib/Analysis ScalarEvolution.cpp, llvm/test/Analysis/ScalarEvolution trip-count-scalable-stride.ll

[SCEV] Always return true for isKnownToBeAPowerOfTwo for SCEVVScale (#183693)

After #183080 vscale is always a power of two, so we don't need to check
for the vscale_range attribute.
DeltaFile
+34-0llvm/test/Analysis/ScalarEvolution/trip-count-scalable-stride.ll
+3-3llvm/lib/Analysis/ScalarEvolution.cpp
+37-32 files

LLVM/project 49f4232llvm/lib/Target/AMDGPU AMDGPULaneMaskUtils.h

[AMDGPU] Remove unused CmpLGOp instruction (#180195)

The instruction was accidentally added, remove it.
Rename OrN2Op to OrN2Opc for consistency with other names
DeltaFile
+2-4llvm/lib/Target/AMDGPU/AMDGPULaneMaskUtils.h
+2-41 files

LLVM/project b9f2a48llvm/include/llvm/Analysis MemorySSA.h, llvm/lib/Analysis MemorySSAUpdater.cpp MemorySSA.cpp

[MemorySSA] Make `getBlockDefs` and `getBlockAccesses` return a non-const list (NFC)

As per discussion at https://github.com/llvm/llvm-project/pull/181709#discussion_r2847595945,
users may already get a non-const MemoryAccess pointer via
`getMemoryAccess` for a given instruction. Drop the restriction
on directly iterate over them by modifying public `getBlockDefs`/
`getBlockAccesses` APIs to return a mutable list, thus dropping the
now obsolete distinction with `getWritableBlockDefs` and
`getWritableBlockAccesses` helpers.
DeltaFile
+16-16llvm/lib/Analysis/MemorySSAUpdater.cpp
+6-20llvm/include/llvm/Analysis/MemorySSA.h
+3-3llvm/lib/Analysis/MemorySSA.cpp
+25-393 files

LLVM/project 5e30ff9lldb/test/API/functionalities/dyld-launch-linux TestDyldLaunchLinux.py

[lldb][test] Re-enable TestDyldLaunchLinux.py for Linux/Arm (#181221)

The test was disabled in c55e021d, but it now passes, with both remote
and local runs.
DeltaFile
+0-1lldb/test/API/functionalities/dyld-launch-linux/TestDyldLaunchLinux.py
+0-11 files

LLVM/project 1afd7d4llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU promote-alloca-vector-gep.ll

[AMDGPU] Support i8/i16 GEP indices when promoting allocas to vectors (#175489)

Allow promote alloca to vector to form a vector element index from
i8/i16
GEPs when the dynamic offset is known to be element size aligned.

Example:
```llvm
%alloca = alloca <3 x float>, addrspace(5)
%idx = select i1 %idx_select, i32 0, i32 4
%p = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %idx
```
Or:
```llvm
%alloca = alloca <3 x float>, addrspace(5)
%idx = select i1 %idx_select, i32 0, i32 2
%p = getelementptr inbounds i16, ptr addrspace(5) %alloca, i32 %idx
```
DeltaFile
+113-0llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
+47-11llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+160-112 files

LLVM/project 250ebfcllvm/test/CodeGen/X86 vec_fcopysign.ll vec-copysign-avx512.ll

[X86] regenerate fcopysign test checks (#183710)

Fix vpternlog comments
DeltaFile
+18-18llvm/test/CodeGen/X86/vec_fcopysign.ll
+6-6llvm/test/CodeGen/X86/vec-copysign-avx512.ll
+24-242 files

LLVM/project d6fcf47libcxx/include/__vector vector.h, libcxx/test/std/containers/sequences/vector/vector.modifiers append_range.pass.cpp

[libc++] Fix vector::append_range growing before the capacity is reached (#183264)

Currently `vector::append_range` grows even when appending a number of
elements that is exactly equal to its spare capacity, which is
guaranteed by the standard to _not_ happen.

Fixes #183256
DeltaFile
+13-2libcxx/test/std/containers/sequences/vector/vector.modifiers/append_range.pass.cpp
+1-1libcxx/include/__vector/vector.h
+14-32 files

LLVM/project 5e1d991llvm/test/CodeGen/X86 stack-align.ll

[X86] stack-align.ll - regenerate test checks with no address scrubing (#183712)

DeltaFile
+67-34llvm/test/CodeGen/X86/stack-align.ll
+67-341 files

LLVM/project 294cf1fllvm/test/CodeGen/X86 fnabs.ll

[X86] fnabs.ll - regenerate test checks and add AVX512 test coverage (#183709)

DeltaFile
+69-25llvm/test/CodeGen/X86/fnabs.ll
+69-251 files

LLVM/project 10b48e4llvm/lib/Transforms/InstCombine InstCombineCalls.cpp, llvm/test/Transforms/InstCombine get_active_lane_mask.ll

[InstCombine] Combine extract from get_active_lane_mask where all lanes inactive (#183329)

When extracting a subvector from the result of a get_active_lane_mask, return
a constant zero vector if it can be proven that all lanes will be inactive.

For example, the result of the extract below will be a subvector where
every lane is inactive if X & Y are const, and `Y * VScale >= X`:
  vector.extract(get.active.lane.mask(Start, X), Y)
DeltaFile
+40-0llvm/test/Transforms/InstCombine/get_active_lane_mask.ll
+10-0llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+50-02 files

LLVM/project 7a5ba65llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 select-bitcast.ll

[AArch64] optimize vselect of bitcast (#180375)

Using code/ideas from the x86 backend to optimize a select on a bitcast
integer. The previous aarch64 approach was to individually extract the
bits from the mask, which is kind of terrible.

https://rust.godbolt.org/z/576sndT66

```llvm
define void @if_then_else8(ptr %out, i8 %mask, ptr %if_true, ptr %if_false) {
start:
  %t = load <8 x i32>, ptr %if_true, align 4
  %f = load <8 x i32>, ptr %if_false, align 4
  %m = bitcast i8 %mask to <8 x i1>
  %s = select <8 x i1> %m, <8 x i32> %t, <8 x i32> %f
  store <8 x i32> %s, ptr %out, align 4
  ret void
}
```

    [64 lines not shown]
DeltaFile
+1,107-0llvm/test/CodeGen/AArch64/select-bitcast.ll
+113-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+1,220-42 files

LLVM/project 9e95cffllvm/lib/CodeGen TargetLoweringBase.cpp, llvm/lib/CodeGen/SelectionDAG LegalizeVectorOps.cpp

[AArch64] Add vector expansion support for ISD::FPOW when using ArmPL (#183526)

This patch is split off from PR #183319 and teaches the backend how to
lower the FPOW DAG node to the vector math library function when using
ArmPL. This is similar to what we already do for llvm.sincos/FSINCOS
today.
DeltaFile
+72-0llvm/test/CodeGen/AArch64/veclib-llvm.pow.ll
+18-5llvm/lib/IR/RuntimeLibcalls.cpp
+18-0llvm/lib/CodeGen/TargetLoweringBase.cpp
+9-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+8-0llvm/test/Transforms/Util/DeclareRuntimeLibcalls/armpl.ll
+125-55 files

LLVM/project 28cbc68clang/include/clang/StaticAnalyzer/Core/PathSensitive CoreEngine.h, clang/lib/StaticAnalyzer/Core ExprEngine.cpp CoreEngine.cpp

[NFC][analyzer] Remove NodeBuilders: part I (#183354)

This commit simplifies some parts of the engine by replacing short-lived
`NodeBuilder`s with `CoreEngine::makeNode`.

Additionally, the three-argument overload of `CoreEngine::enqueue` is
renamed to `enqueueStmtNodes` to highlight that it just calls
`enqueueStmtNode` in a loop.
DeltaFile
+12-20clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+2-2clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+2-1clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
+16-233 files

LLVM/project 4147cd2llvm/lib/Target/WebAssembly WebAssemblyFastISel.cpp, llvm/test/CodeGen/WebAssembly load-ext.ll offset-fastisel.ll

[WebAssembly][FastISel] Emit signed loads for sext of i8/i16/i32 (#182767)

FastISel currently defaults to unsigned loads for i8/i16/i32 types,
leaving any sign-extension to be handled by a separate instruction. This
patch optimizes this by folding the SExtInst into the LoadInst, directly
emitting a signed load (e.g., i32.load8_s).

When a load has a single SExtInst use, selectLoad emits a signed load
and safely removes the redundantly emitted SExtInst.

Fixed: #180783
DeltaFile
+48-0llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
+6-12llvm/test/CodeGen/WebAssembly/load-ext.ll
+1-2llvm/test/CodeGen/WebAssembly/offset-fastisel.ll
+55-143 files

LLVM/project f71bd1cclang/lib/AST/ByteCode EvaluationResult.cpp Record.cpp

[clang][bytecode] Add `Record::hasPtrField()` (#183513)

So we can short-circuit the checking in
`EvaluationResult::collectBlock()`. This improves the compile time of
`X86Disassembler.cpp` by roughly 3.8%:
https://llvm-compile-time-tracker.com/compare_clang.php?from=d69c6a8528c60a8f8013651ff18ed4882f6e6836&to=b8b6333551d7c644e3c1b00ed19aceea09da40cc&stat=instructions%3Au
DeltaFile
+18-6clang/lib/AST/ByteCode/EvaluationResult.cpp
+11-4clang/lib/AST/ByteCode/Record.cpp
+7-2clang/lib/AST/ByteCode/Program.cpp
+5-1clang/lib/AST/ByteCode/Record.h
+41-134 files

LLVM/project 1ea2f25llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: address suggestions
DeltaFile
+2-68llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+5-7llvm/lib/Analysis/UniformityAnalysis.cpp
+3-5llvm/include/llvm/ADT/GenericUniformityImpl.h
+10-803 files

LLVM/project d43213fllvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/RISCV reductions.ll tail-folding-cast-intrinsics.ll

Revert "[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)" (#183698)

This reverts commit b0b3e3e1c7f6387eabc2ef9ff1fea311e63a4299.

After thinking about this for a bit, I don't think this is correct.
vscale being a power-of-2 only guarantees the canonical IV increment
overflows to zero, but not overflows in general.
DeltaFile
+36-36llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll
+22-22llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll
+18-18llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll
+17-17llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
+25-7llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+14-14llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll
+132-11454 files not shown
+261-24360 files

LLVM/project 16aa190clang/lib/AST/ByteCode Pointer.h

[clang][bytecode][NFC] Print more info in Pointer::operator<< (#183691)

So we know whether a pointer is a dummy and alive.
DeltaFile
+4-0clang/lib/AST/ByteCode/Pointer.h
+4-01 files

LLVM/project df8b74ellvm/lib/Target/AMDGPU SIRegisterInfo.cpp, llvm/test/CodeGen/AMDGPU vgpr-spill.mir

[AMDGPU] Multi dword spilling for unaligned tuples

While spilling unaligned tuples, rather than breaking the
spill into 32-bit accesses, spill the first register as a
single 32-bit spill, and spill the remainder of the tuple
as an aligned tuple.
Some additional bookkeeping is required in the spilling
loop to manage the state.
DeltaFile
+44-7llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+8-26llvm/test/CodeGen/AMDGPU/vgpr-spill.mir
+52-332 files

LLVM/project c690414clang/lib/AST/ByteCode Compiler.cpp

[clang][bytecode][NFC] Refactor visitDeclRef() (#183690)

Move the `!VD` case up so we can assume `VD` to be non-null earlier and
use a local variable instead of calling `D->getType()` several times.
DeltaFile
+15-17clang/lib/AST/ByteCode/Compiler.cpp
+15-171 files

LLVM/project a1f83ballvm/lib/Transforms/Vectorize VPlanTransforms.cpp

[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)

The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).

I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
DeltaFile
+65-8llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+65-81 files

LLVM/project 5af5bd4llvm/lib/Target/X86 X86ISelLowering.cpp X86ExpandPseudo.cpp

[AMX][NFC] Match pseudo name with isa  (#182235)

Adds missing suffix to clear intent for isa.
we switch from `TILEMOVROWrre` to `TILEMOVROWrte` in
https://github.com/llvm/llvm-project/pull/168193 , however pseudo was
same, updating pseudo to intent right isa version, This patch makes
changes `PTILEMOVROWrre` to `PTILEMOVROWrte`, even though pseudo does
not actually have any tile register.

---------

Co-authored-by: mattarde <mattarde at intel.com>
DeltaFile
+24-24llvm/lib/Target/X86/X86ISelLowering.cpp
+24-24llvm/lib/Target/X86/X86ExpandPseudo.cpp
+18-18llvm/lib/Target/X86/X86InstrAMX.td
+12-12llvm/lib/Target/X86/X86PreTileConfig.cpp
+78-784 files

LLVM/project 058705bclang/include/clang/StaticAnalyzer/Core/PathSensitive ProgramState.h, clang/lib/StaticAnalyzer/Core ProgramState.cpp

[Clang][NFCI] Make program state GDM key const pointer (#183477)

This commit makes the GDM key in ProgramState a constant pointer. This
is done to better reflect the intention of the key as a unique
identifier for the data stored in the GDM, and to prevent the use of the
storage pointed to by the key as global state.

Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
DeltaFile
+9-8clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+8-7clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+17-152 files

LLVM/project 9145bf6llvm/lib/Target/ARM ARMISelLowering.cpp, llvm/test/CodeGen/ARM fp-intrinsics-vector-v8.ll

Lower strictfp vector rounding operations similar to default mode

Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
DeltaFile
+10-220llvm/test/CodeGen/ARM/fp-intrinsics-vector-v8.ll
+7-12llvm/lib/Target/ARM/ARMISelLowering.cpp
+17-2322 files

LLVM/project db56f21llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f64.ll rsq.f64.ll

AMDGPU: Skip last corrections and scaling for afn llvm.sqrt.f64

Device libs has a fast sqrt macro implemented this way.
DeltaFile
+240-652llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+140-602llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+23-17llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+22-17llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+425-1,2884 files

LLVM/project a4b65abllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f64.ll rsq.f64.ll

AMDGPU: Improve fsqrt f64 expansion with ninf

Address todo to reduce the is_fpclass check to an fcmp with 0.
DeltaFile
+52-92llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+60-80llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+10-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-3llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+3-2llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fsqrt.mir
+132-1835 files

LLVM/project 9270406llvm/lib/Target/X86 X86TargetTransformInfo.cpp, llvm/lib/Transforms/Vectorize VectorCombine.cpp

[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575)

Unless we're working with AVX512 mask predicate types, sign extending a
vXi1 comparison result back to the width of the comparison source types
is free.

VectorCombine::foldShuffleOfCastops - pass the original CastInst in the
getCastInstrCost calls to track the source comparison instruction.

Fixes #165813
DeltaFile
+12-54llvm/test/Transforms/VectorCombine/X86/shuffle-of-casts.ll
+14-0llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+2-2llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+28-563 files