LLVM/project fbac55bllvm/lib/Target/AArch64 AArch64InstrInfo.td AArch64ISelDAGToDAG.cpp, llvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp

[AArch64] Optimize vector fmul(sitofp/uitofp, 1/2^N) -> scvtf/ucvtf (#141480)

When a vector integer-to-float conversion is followed by a multiply with a
reciprocal power-of-two constant, we can fold both operations into a single
SCVTF or UCVTF instruction with a fixed-point shift operand.

For example, `fmul(sitofp(v2i32 x), <0.5, 0.5>)` becomes `scvtf.2s v0, v0, #1`.

This is a reworked version with several improvements over the original
submission:
- Rewrite the C++ operand matcher to share implementation with the existing
    `SelectCVTFixedPointVec` (MOVIshift, FMOV, and DUP handling with correct
    truncation for f16)
- Add `uitofp`/`ucvtf` patterns via a `CVTFRecipPat` multiclass
- Add full GlobalISel support (`GIComplexOperandMatcher` + renderer)

Supported vector types: `v2f32`, `v4f32`, `v2f64`, `v4f16`, `v8f16`.

Fixes #94909
DeltaFile
+474-0llvm/test/CodeGen/AArch64/scvtf-div-mul-combine.ll
+57-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+34-10llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+26-6llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+591-164 files

LLVM/project 7b43dcdllvm/test/CodeGen/AArch64 neon-rshrn.ll

[AArch64] Add disjoint or tests for rshrn and raddhn. NFC (#194252)

These should already be OK, as the os disjoint or connot round up.
DeltaFile
+76-0llvm/test/CodeGen/AArch64/neon-rshrn.ll
+76-01 files

LLVM/project bb4aebbllvm/lib/Transforms/Vectorize SLPVectorizer.cpp

Small improvements

Created using spr 1.3.7
DeltaFile
+15-8llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+15-81 files

LLVM/project 0e141adllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AMDGPU notriviallyvectorizableintrinsicoperands.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+237-9llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+105-54llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll
+61-98llvm/test/Transforms/SLPVectorizer/AMDGPU/notriviallyvectorizableintrinsicoperands.ll
+34-47llvm/test/Transforms/SLPVectorizer/X86/non-vectorizable-inst-operand.ll
+30-42llvm/test/Transforms/SLPVectorizer/X86/parent-node-non-schedulable.ll
+11-15llvm/test/Transforms/SLPVectorizer/X86/split-node-marked-to-gather.ll
+478-26515 files not shown
+554-34921 files

LLVM/project 8efcfc2llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer revec.ll

[SLP] Reuse diamond-matched gather across asymmetric reorder/reuse

processBuildVector's perfect-diamond match used Entries.front()->isSame(
E->Scalars) only, missing matches where E carries the reorder/reuse and
the entry is canonical. Two TreeEntries with the same effective scalar
layout but different raw orderings then build independent gathers; one
emits a fill-in shufflevector for reused lanes while the other leaves
poison there.

Fixes #194191.

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/194247
DeltaFile
+10-5llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-2llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll
+1-2llvm/test/Transforms/SLPVectorizer/revec.ll
+1-1llvm/test/Transforms/SLPVectorizer/X86/select-copyable-cmp-poison.ll
+13-104 files

LLVM/project 91805dcllvm/test/Transforms/SLPVectorizer/X86 select-copyable-cmp-poison.ll

[SLP][NFC]Add a test with the incorrect vectorization for fully matched, but reordered, node, NFC



Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/194244
DeltaFile
+203-0llvm/test/Transforms/SLPVectorizer/X86/select-copyable-cmp-poison.ll
+203-01 files

LLVM/project 72c8f98flang/include/flang/Parser parse-tree.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][OpenMP] Rename "declare constructs" to directives, NFC (#194240)

Only executable directives are constructs in OpenMP, so, for example,
"declare mapper" is not a construct.

Apply

find flang/ \( -name '*.cpp' -o -name '*.h' -o -name '*.f90' \) -exec sed \
-i -E -e 's/OpenMP(Declare[A-Za-z]*)Construct\b/Omp\1Directive/g' {} \;

plus local formatting updates as needed.
DeltaFile
+16-15flang/lib/Lower/OpenMP/OpenMP.cpp
+11-12flang/lib/Semantics/resolve-directives.cpp
+10-11flang/include/flang/Parser/parse-tree.h
+9-9flang/test/Parser/OpenMP/declare_target-device_type.f90
+8-10flang/lib/Semantics/check-omp-structure.cpp
+8-8flang/lib/Semantics/check-omp-structure.h
+62-6520 files not shown
+118-12226 files

LLVM/project be59278flang/include/flang/Parser parse-tree.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][OpenMP] Rename "declare constructs" to directives, NFC

Only executable directives are constructs in OpenMP, so, for example,
"declare mapper" is not a construct.

Apply

find flang/ \( -name '*.cpp' -o -name '*.h' -o -name '*.f90' \) -exec sed \
  -i -E -e 's/OpenMP(Declare[A-Za-z]*)Construct\b/Omp\1Directive/g' {} \;

plus local formatting updates as needed.
DeltaFile
+16-15flang/lib/Lower/OpenMP/OpenMP.cpp
+11-12flang/lib/Semantics/resolve-directives.cpp
+10-11flang/include/flang/Parser/parse-tree.h
+9-9flang/test/Parser/OpenMP/declare_target-device_type.f90
+8-10flang/lib/Semantics/check-omp-structure.cpp
+8-8flang/lib/Semantics/check-omp-structure.h
+62-6520 files not shown
+118-12226 files

LLVM/project c65bcf2llvm/lib/Analysis ValueTracking.cpp, llvm/lib/Transforms/InstCombine InstCombineAddSub.cpp InstCombineInternal.h

[InstCombine] Div ceil optimizations  (#190175)

Relates: https://github.com/llvm/llvm-project/issues/187838

This PR improves handling of `div_ceil` from rust (which emits a div +
rem).

Currently, these three rust functions:
```rust
use std::hint::assert_unchecked;

#[unsafe(no_mangle)]
pub fn div_ceil_without_assume(x: u32) -> u32 {
    x.div_ceil(7)
}

#[unsafe(no_mangle)]
pub fn div_ceil_with_assume(x: u32) -> u32 {
    unsafe {

    [313 lines not shown]
DeltaFile
+210-0llvm/test/Transforms/InstCombine/divceil.ll
+43-0llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+6-0llvm/lib/Transforms/InstCombine/InstCombineInternal.h
+4-0llvm/lib/Analysis/ValueTracking.cpp
+1-1llvm/test/Transforms/InstCombine/fls.ll
+264-15 files

LLVM/project 2d789ffllvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanConstruction.cpp, llvm/test/Transforms/LoopVectorize induction.ll

[VPlan] Verify and handle FOR legality during header phi creation (NFC). (#191298)

Move the logic to validate FOR users and introduce the split directly to
header phi creation. It makes sense to introduce the header phi and the
splice together.

It also means sinking only needs to be done once, instead for each
VPlan.

Depends on https://github.com/llvm/llvm-project/pull/190681.

PR: https://github.com/llvm/llvm-project/pull/191298
DeltaFile
+36-257llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+201-3llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+15-16llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+10-17llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+8-8llvm/test/Transforms/LoopVectorize/induction.ll
+15-0llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+285-3014 files not shown
+295-30810 files

LLVM/project c23a089llvm/include/llvm/ProfileData SampleProf.h, llvm/test/tools/llvm-profgen filter-build-id-unsymbolized.test

format, update test

Created using spr 1.3.4
DeltaFile
+9-16llvm/test/tools/llvm-profgen/filter-build-id-unsymbolized.test
+3-4llvm/include/llvm/ProfileData/SampleProf.h
+1-2llvm/tools/llvm-profgen/PerfReader.cpp
+13-223 files

LLVM/project 8c554c4llvm/lib/Target/LoongArch LoongArchLASXInstrInfo.td LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx/ir-instruction sub.ll add.ll

[LoongArch] Add support for vector add/sub on vNi128 types

Legalize ADD/SUB for v1i128 and v2i128 and extend LSX/LASX instruction
selection patterns to support the Q element size. Update register classes
to include vNi128 types and add codegen tests to verify lowering to
VADD.Q/XVADD.Q and VSUB.Q/XVSUB.Q.
DeltaFile
+6-102llvm/test/CodeGen/LoongArch/lasx/ir-instruction/sub.ll
+6-93llvm/test/CodeGen/LoongArch/lasx/ir-instruction/add.ll
+6-43llvm/test/CodeGen/LoongArch/lsx/ir-instruction/add.ll
+6-42llvm/test/CodeGen/LoongArch/lsx/ir-instruction/sub.ll
+17-4llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+17-4llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+58-2882 files not shown
+66-2928 files

LLVM/project d9d210dllvm/test/CodeGen/LoongArch/lasx/ir-instruction sub.ll add.ll, llvm/test/CodeGen/LoongArch/lsx/ir-instruction add.ll sub.ll

[LoongArch][NFC] Add tests for vector add on vNi128
DeltaFile
+108-2llvm/test/CodeGen/LoongArch/lasx/ir-instruction/sub.ll
+99-2llvm/test/CodeGen/LoongArch/lasx/ir-instruction/add.ll
+49-2llvm/test/CodeGen/LoongArch/lsx/ir-instruction/add.ll
+48-2llvm/test/CodeGen/LoongArch/lsx/ir-instruction/sub.ll
+304-84 files

LLVM/project 4dafdf6llvm/include/llvm/CodeGen ValueTypes.td, llvm/test/Analysis/CostModel/ARM arith-overflow.ll

[CodeGen] Add <2 x i128> value type (#193910)
DeltaFile
+6-6llvm/test/Analysis/CostModel/ARM/arith-overflow.ll
+2-2llvm/test/TableGen/CPtrWildcard.td
+1-0llvm/include/llvm/CodeGen/ValueTypes.td
+9-83 files

LLVM/project 6439bbellvm/lib/Target/ARM ARMISelLowering.cpp, llvm/test/CodeGen/ARM hoist-and-by-const-from-shl-in-eqcmp-zero.ll hoist-and-by-const-from-lshr-in-eqcmp-zero.ll

Revert "[ARM] Fold SELECT (AND(X,1) == 0), C1, C2 -> XOR(C1,AND(NEG(AND(X,1)),XOR(C1,C2)) in Thumb1 (#185898)" (#194230)

This reverts commit 1823355d06b854854701a8ba430aa1f6be9994f4 due to
performance
regressions in benchmarks.
DeltaFile
+103-92llvm/test/CodeGen/ARM/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
+97-88llvm/test/CodeGen/ARM/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
+1-49llvm/lib/Target/ARM/ARMISelLowering.cpp
+201-2293 files

LLVM/project aa3de78llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 spillcost-call-between-operands.ll spillcost-loop-backedge.ll

[SLP] Fix spill-cost cache lookup and predecessor scan

A cached intra-block scan that stopped at a call or budget limit only
proves the sub-range below the stop point is call-free; do not reuse
the cached bit for queries whose First lies above it. Also switch the
cross-block predecessor scan to "exists a call-free backward path"
semantics, skip blocks strictly dominated by Root, and memoize only
the (Root, OpParent) key. Fixes a false-positive spill cost that was
blocking profitable vectorization.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/192709
DeltaFile
+104-26llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+4-11llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-call-between-operands.ll
+0-5llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-loop-backedge.ll
+108-423 files

LLVM/project d593279llvm/lib/Target/NVPTX NVPTXISelLowering.cpp NVPTXISelLowering.h, llvm/test/CodeGen/NVPTX fp-contract-f32x2.ll

[NVPTX] Scalarize `contract FMUL v2f32` to enable FMA fusion (#192815)

SM100+ legalizes `FMUL v2f32`, blocking the scalar FADD->FMA combiner.
Scalarize it when `contract` (or `allowFMA()`) is set and every lane
feeds a single `contract` FADD.
DeltaFile
+62-24llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+29-0llvm/test/CodeGen/NVPTX/fp-contract-f32x2.ll
+10-0llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+101-243 files

LLVM/project ab57f94libcxx/utils/ci/images libcxx_next_runners.txt

[libcxx] Bump next runner set (#194211)

So that we can pick up changes in
49b0451ec690dfc76690c19032bdc97c2889000b.
DeltaFile
+1-1libcxx/utils/ci/images/libcxx_next_runners.txt
+1-11 files

LLVM/project a0e42c2clang/lib/AST/ByteCode Integral.h Interp.h, clang/test/AST/ByteCode functions.cpp

[clang][bytecode] Add new IntegralType for function addresses (#194206)

We used to use just `::Address` for functions, which later caused
problems because we casted the pointer to `ValueDecl*` and passed it to
`Program::getOrCreateGlobal()`, which doesn't work of course.
DeltaFile
+9-0clang/lib/AST/ByteCode/Integral.h
+9-0clang/test/AST/ByteCode/functions.cpp
+5-1clang/lib/AST/ByteCode/Interp.h
+2-0clang/lib/AST/ByteCode/Primitives.h
+25-14 files

LLVM/project 49b0451libcxx/utils/ci/docker docker-compose.yml

[libcxx] Bump base image version to most recent (#194209)

To pull in the changes from bd75c10199a159a20720f8ee5c00afebb033f46e.
DeltaFile
+2-2libcxx/utils/ci/docker/docker-compose.yml
+2-21 files

LLVM/project 32a9f63clang/lib/AST/ByteCode Interp.cpp Interp.h, clang/test/Sema static-init.c

[clang][bytecode] Fix some problems with ptr-to-int casts (#193988)

1) When doing integral casts on a pointer-casted-to-integral, check the
bitwidth we're casting _to_, not the one we're casting _from_.
2) When the pointer we're casting to an integral is a dummy pointer,
don't forget to check the bitwidth.
DeltaFile
+5-3clang/lib/AST/ByteCode/Interp.cpp
+1-1clang/lib/AST/ByteCode/Interp.h
+1-0clang/test/Sema/static-init.c
+7-43 files

LLVM/project 33f2036llvm/include/llvm/MC MCAsmInfo.h, llvm/lib/Target/AArch64/MCTargetDesc AArch64MCAsmInfo.cpp

[MC] Add MCTargetOptions to MCAsmInfo constructor. NFC (#194200)

Since #180464 the canonical MCTargetOptions pointer is stored in
MCAsmInfo, but it is bound after construction via `setTargetOptions`
called from TargetRegistry::createMCAsmInfo.

Direct constructions in unit tests can leave the pointer null, leading
to a runtime assert failure. Add MCTargetOptions to every MCAsmInfo
subclass constructor, store it as a reference in MCAsmInfo, and remove
`setTargetOptions()`.
DeltaFile
+18-9llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
+12-6llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.h
+11-4llvm/lib/Target/ARM/MCTargetDesc/ARMMCAsmInfo.cpp
+11-4llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp
+7-7llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
+4-5llvm/include/llvm/MC/MCAsmInfo.h
+63-3578 files not shown
+231-15984 files

LLVM/project 63f2a6dllvm/include/llvm/ProfileData SampleProf.h, llvm/test/tools/llvm-profgen filter-build-id.test filter-build-id-unsymbolized.test

[𝘀𝗽𝗿] initial version

Created using spr 1.3.4
DeltaFile
+101-25llvm/tools/llvm-profgen/PerfReader.cpp
+33-0llvm/test/tools/llvm-profgen/filter-build-id.test
+32-0llvm/test/tools/llvm-profgen/filter-build-id-unsymbolized.test
+21-3llvm/include/llvm/ProfileData/SampleProf.h
+24-0llvm/test/tools/llvm-profgen/Inputs/buildid-unsymbolized.raw
+12-3llvm/tools/llvm-profgen/PerfReader.h
+223-311 files not shown
+234-317 files

LLVM/project 3641e28llvm/test/tools/llvm-profgen filter-build-id.test, llvm/test/tools/llvm-profgen/Inputs buildid-cs-noprobe.aggperfscript

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.4

[skip ci]
DeltaFile
+90-24llvm/tools/llvm-profgen/PerfReader.cpp
+33-0llvm/test/tools/llvm-profgen/filter-build-id.test
+12-3llvm/tools/llvm-profgen/PerfReader.h
+11-0llvm/test/tools/llvm-profgen/Inputs/buildid-cs-noprobe.aggperfscript
+146-274 files

LLVM/project bd75c10libcxx/utils/ci/docker linux-builder-base.dockerfile

[libcxx] Include python3-yaml and rsync in container (#194182)

rsync is needed for installing the kernel headers for the libc build.
The yaml python package is needed for libc's hdrgen. This means we no
longer have to install these utilities at runtime.

They should be small enough relative to the existing container image
size to not really have an impact in that regard.
DeltaFile
+2-0libcxx/utils/ci/docker/linux-builder-base.dockerfile
+2-01 files

LLVM/project c012265clang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode functions.cpp

[clang][bytecode] Reject functions with dependent return type (#194114)

This unfortunately crashes the current interpreter as well.
DeltaFile
+12-3clang/test/AST/ByteCode/functions.cpp
+3-0clang/lib/AST/ByteCode/Compiler.cpp
+15-32 files

LLVM/project 682cf72mlir/lib/Dialect/Arith/IR ArithOps.cpp, mlir/test/Dialect/Arith canonicalize.mlir

[mlir][arith] Fold subi(a, subi(a, b)) to b (#194134)

Add a folder for `arith.subi` that simplifies `subi(a, subi(a, b))` to
`b` using the algebraic identity `a - (a - b) = b`.
DeltaFile
+10-0mlir/test/Dialect/Arith/canonicalize.mlir
+7-2mlir/lib/Dialect/Arith/IR/ArithOps.cpp
+17-22 files

LLVM/project c2a9725mlir/include/mlir/Dialect/Math/IR MathOps.td, mlir/lib/Dialect/Math/IR MathOps.cpp

[mlir][math] Add constant folding for sincos/cbrt (#194130)

Adds constant folder for `math.sincos` and `math.cbrt`.
DeltaFile
+51-0mlir/test/Dialect/Math/canonicalize.mlir
+48-1mlir/lib/Dialect/Math/IR/MathOps.cpp
+6-3mlir/include/mlir/Dialect/Math/IR/MathOps.td
+105-43 files

LLVM/project a75e6a5mlir/lib/Dialect/XeGPU/Transforms XeGPUUnroll.cpp, mlir/test/Dialect/XeGPU xegpu-wg-to-sg.mlir xegpu-wg-to-sg-unify-ops.mlir

[MLIR][XeGPU] Remove offsets from create_nd_tdesc & remove update_nd_offset, move offsets to load/store/prefetch ops (#193330)

This PR removes the optional offsets/const_offsets operands on
xegpu.create_nd_tdesc and instead mandates offsets directly on the
consuming load, store, and prefetch ops. It also deprecates the
update_nd_offset op.
DeltaFile
+980-230mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir
+0-987mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+245-107mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir
+164-174mlir/test/Dialect/XeGPU/propagate-layout.mlir
+44-282mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp
+106-147mlir/test/Dialect/XeGPU/xegpu-blocking.mlir
+1,539-1,92721 files not shown
+1,946-3,33627 files

LLVM/project e7164d4libclc CMakeLists.txt

[libclc] Only check the triple architecture for libclc (#194149)

Summary:
Previously, `nvptx64--` would reject `nvptx64-unknown-unknown`. Two
options, either normalize all the triples in CMake, or just check the
architecture. I went with the former because it makes it easier for
people to pass different values.
DeltaFile
+9-14libclc/CMakeLists.txt
+9-141 files