LLVM/project a9eeb15llvm/test/TableGen SubRegsLaneBitmask.td, llvm/utils/TableGen/Common CodeGenRegisters.cpp

[Tablegen] Fix condition to report when lanemask overflows (#181810)

This PR:

Fixes a slight off-by-one error in the check for how many bits are
allocated for subreg lane masks. If 65 subreg lanes are used, it fails
later, but the error message is not clear as to what has occured.
DeltaFile
+12-0llvm/test/TableGen/SubRegsLaneBitmask.td
+1-1llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+13-12 files

LLVM/project be9e84emlir/lib/Dialect/Linalg/Transforms Specialize.cpp, mlir/test/Dialect/Linalg specialize-generic-ops.mlir

[mlir] [linalg] fix failure on specializing matmul with permuted loops (#184294)

This patch fixes generic specialization when the loop dimensions are
permuted in the generic w.r.t. to canonical iterator order of the named
ops by not forwarding the maps of the original generic and instead
recreating them ensuring they always follow the canonical order.

For example, the generic which is to be specialized to a matmul could
have `[parallel, reduction, parallel]` loops, specializing this as is
and just coping the indexing maps like we do now will lead to a
verification error since the dimension will not match the canonical form
the matmul named op expects

e.g. the maps could be:
```
(m, k, n) -> (m,k)
...
```
So we would have to recreate the maps to be:

    [5 lines not shown]
DeltaFile
+210-0mlir/test/Dialect/Linalg/specialize-generic-ops.mlir
+34-9mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+244-92 files

LLVM/project a65a1fdllvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Rewrite formula in the Weak Zero SIV tests
DeltaFile
+69-72llvm/lib/Analysis/DependenceAnalysis.cpp
+8-8llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-large-btc.ll
+4-8llvm/include/llvm/Analysis/DependenceAnalysis.h
+2-6llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-overflow.ll
+2-2llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-large-btc.ll
+85-965 files

LLVM/project b3d99acclang/test/CodeGen/AArch64 varargs.c, clang/test/CodeGen/PowerPC ppc64-dwarf.c ppc32-dwarf.c

[CodeGen] Use data layout aware constant folder in CGBuilder (#184819)

Use the DataLayout-aware TargetFolder instead of ConstantFolder in
Clang's CGBuilder. The primary impact of this change is that GEP
constant expressions are now emitted in canonical `getelementptr i8`
form. This is in preparation for the migration to ptradd, which requires
this form.

Part of the test updates were performed by Claude Code and reviewed by
me.
DeltaFile
+200-200clang/test/OpenMP/threadprivate_codegen.cpp
+116-116clang/test/CodeGen/PowerPC/ppc64-dwarf.c
+113-113clang/test/CodeGen/PowerPC/ppc32-dwarf.c
+106-109clang/test/OpenMP/for_reduction_codegen.cpp
+87-87clang/test/CodeGen/Sparc/sparcv9-dwarf.c
+72-72clang/test/CodeGen/AArch64/varargs.c
+694-697123 files not shown
+1,601-1,603129 files

LLVM/project aeefed7llvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Rewrite formula in the Weak Zero SIV tests
DeltaFile
+67-72llvm/lib/Analysis/DependenceAnalysis.cpp
+8-8llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-large-btc.ll
+4-8llvm/include/llvm/Analysis/DependenceAnalysis.h
+2-6llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-overflow.ll
+2-2llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-large-btc.ll
+83-965 files

LLVM/project 6b3d908llvm/test/CodeGen/AMDGPU sgpr-spill-update-only-slot-indexes.ll mfma-no-register-aliasing.ll

[AMDGPU] Add missing -wwm-regalloc=fast to 4 more tests (NFC)

Adding the missing wwm-regalloc=fast option in 4 more tests
that already specify -sgpr-regalloc=fast and -vgpr-regalloc=fast.
For consistency, the same preference should be applied to the
wwm-regalloc pipeline as well.
This is a follow-up to #184190 which addressed the same issue in
attr-amdgpu-flat-work-group-size-vgpr-limit.ll.
DeltaFile
+1-1llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll
+1-1llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll
+1-1llvm/test/CodeGen/AMDGPU/vgpr-agpr-limit-gfx90a.ll
+1-1llvm/test/CodeGen/AMDGPU/vgpr-limit-gfx1250.ll
+4-44 files

LLVM/project aab7376llvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Remove isPeelFirst and isPeelLast (#183737)

`isPeelFirst` and `isPeelLast` are updated only in the Weak Zero SIV
tests, and no clients actually use them. Keeping these features while
fixing the existing defects in DA would add unnecessary complexity. If
they are unnecessary in the first place, it would be better to delete
them to mitigate maintenance burden.
DeltaFile
+1-24llvm/include/llvm/Analysis/DependenceAnalysis.h
+0-20llvm/lib/Analysis/DependenceAnalysis.cpp
+3-3llvm/test/Analysis/DependenceAnalysis/WeakZeroDstSIV.ll
+3-3llvm/test/Analysis/DependenceAnalysis/WeakZeroSrcSIV.ll
+7-504 files

LLVM/project 33508a2llvm/include/llvm/ExecutionEngine/Orc WaitingOnGraph.h, llvm/unittests/ExecutionEngine/Orc WaitingOnGraphTest.cpp

[ORC] Make ElementSet, ContainerElementsMap inner classes. (#184955)

ElementSet and ContainerElementsMap were type aliases inside
WaitingOnGraph.

This commit replaces the aliases with classes deriving from DenseSet and
DenseMap, with convenience operations added for WaitingOnGraph (merge,
remove, remove_if, and visit). These convenience functions are used to
simplify the implementation of various parts of WaitingOnGraph.

Unit tests are added for the convenience operations to improve test
coverage.

In addition to improving readability of the main WaitingOnGraph
operations, this will make it easier to experiment with other underlying
representations for these types (e.g. sorted vectors).
DeltaFile
+136-72llvm/include/llvm/ExecutionEngine/Orc/WaitingOnGraph.h
+189-0llvm/unittests/ExecutionEngine/Orc/WaitingOnGraphTest.cpp
+325-722 files

LLVM/project 5af503flibclc/clc/include/clc/subgroup sub_group_broadcast.h clc_subgroup_broadcast_scalarize.inc, libclc/clc/lib/amdgcn/subgroup sub_group_broadcast.cl

libclc: Add sub_group_broadcast
DeltaFile
+88-0libclc/clc/lib/amdgcn/subgroup/sub_group_broadcast.cl
+22-0libclc/clc/include/clc/subgroup/sub_group_broadcast.h
+21-0libclc/clc/include/clc/subgroup/clc_subgroup_broadcast_scalarize.inc
+16-0libclc/opencl/lib/generic/subgroup/sub_group_broadcast.inc
+15-0libclc/opencl/lib/generic/subgroup/sub_group_broadcast.cl
+10-0libclc/clc/include/clc/subgroup/clc_subgroup_broadcast.inc
+172-02 files not shown
+174-08 files

LLVM/project ff633ddlld/test/ELF riscv-relax-synthetic-in-text.s loongarch-relax-synthetic-in-text.s

Address @MaskRay's comments

Created using spr 1.3.7
DeltaFile
+18-6lld/test/ELF/riscv-relax-synthetic-in-text.s
+16-6lld/test/ELF/loongarch-relax-synthetic-in-text.s
+34-122 files

LLVM/project f90b783llvm/lib/CodeGen/SelectionDAG SelectionDAGBuilder.cpp DAGCombiner.cpp, llvm/lib/Target/WebAssembly WebAssemblyISelLowering.h

[WebAssembly] Do not form minnum/maxnum (#184796)

For wasm, forming minnum/maxnum style ISD nodes is non-profitable,
because (in cases where any float min/max support exists at all), it has
pmin/pmax instructions that correspond to the fcmp+select semantics, or
relaxed_fmin/relaxed_fmax (for the nnan+nsz case) with even loser
semantics.

As such, return false from isProfitableToCombineMinNumMaxNum(), and also
respect that hook in the SDAGBuilder.
DeltaFile
+12-132llvm/test/CodeGen/WebAssembly/simd-relaxed-fmax.ll
+12-132llvm/test/CodeGen/WebAssembly/simd-relaxed-fmin.ll
+6-0llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
+6-0llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+1-1llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+37-2655 files

LLVM/project daeb99elibclc/clc/include/clc/subgroup sub_group_broadcast.h clc_subgroup_broadcast.inc, libclc/clc/lib/amdgcn SOURCES

libclc: Add sub_group_broadcast
DeltaFile
+81-0libclc/clc/lib/amdgcn/subgroup/sub_group_broadcast.cl
+22-0libclc/clc/include/clc/subgroup/sub_group_broadcast.h
+16-0libclc/opencl/lib/generic/subgroup/sub_group_broadcast.inc
+15-0libclc/opencl/lib/generic/subgroup/sub_group_broadcast.cl
+10-0libclc/clc/include/clc/subgroup/clc_subgroup_broadcast.inc
+1-0libclc/clc/lib/amdgcn/SOURCES
+145-01 files not shown
+146-07 files

LLVM/project 049efc7libclc/opencl/lib/amdgcn SOURCES, libclc/opencl/lib/amdgcn/subgroup subgroup.cl

libclc: Add amdgpu subgroup functions (#184845)
DeltaFile
+60-0libclc/opencl/lib/amdgcn/subgroup/subgroup.cl
+21-0libclc/opencl/lib/amdgcn/synchronization/sub_group_barrier.cl
+2-0libclc/opencl/lib/amdgcn/SOURCES
+83-03 files

LLVM/project 23edefallvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-aggregates.ll

InstCombine: Handle insertvalue in SimplifyDemandedFPClass (#184193)
DeltaFile
+68-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-aggregates.ll
+9-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+77-02 files

LLVM/project b51859cllvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV rvp-ext-rv64.ll rvp-ext-rv32.ll

[RISCV][P-ext] Recognize vector shifts with splat build_vector shift amount. (#184909)

If the shift is created during LegalizeVectorOps, the shift amount
will be created as a build_vector. Splat_vector is formed by a later
DAGCombine. LegalizeVectorOps will visit the new shift before the
splat_vector can be created. Handle this case too
DeltaFile
+124-0llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+66-0llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+9-4llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+199-43 files

LLVM/project f2cf0cdllvm/lib/Target/RISCV RISCVInstrInfoP.td

[RISCV] Remove unneeded ImmLeaf from simm8_unsigned. NFC (#184960)
DeltaFile
+1-1llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+1-11 files

LLVM/project a33cecbllvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Rewrite formula in the Weak Zero SIV tests
DeltaFile
+67-72llvm/lib/Analysis/DependenceAnalysis.cpp
+8-8llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-large-btc.ll
+4-8llvm/include/llvm/Analysis/DependenceAnalysis.h
+2-6llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-overflow.ll
+2-2llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-large-btc.ll
+83-965 files

LLVM/project 128e676llvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Remove isPeelFirst and isPeelLast
DeltaFile
+1-24llvm/include/llvm/Analysis/DependenceAnalysis.h
+0-20llvm/lib/Analysis/DependenceAnalysis.cpp
+3-3llvm/test/Analysis/DependenceAnalysis/WeakZeroDstSIV.ll
+3-3llvm/test/Analysis/DependenceAnalysis/WeakZeroSrcSIV.ll
+7-504 files

LLVM/project 7dda8ballvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis weak_zero_siv_parametric_coeff.ll WeakZeroSrcSIV.ll

[DA] Fix the Weak Zero SIV tests when the coeff may be zero (#183736)

In the Weak Zero SIV tests, given two subscripts `{c0,+,a}` and `c1`,
when `c0 == c1`, the tests conclude that a dependency exists from the
former subscript at the first iteration to the latter subscript at every
iteration. However, this conclusion is correct only when `a` is not
zero, which was not being checked.
This patch adds non-zero checks for `a` in the Weak Zero SIV tests.
Fix the test cases added in #183735 .
DeltaFile
+4-10llvm/test/Analysis/DependenceAnalysis/weak_zero_siv_parametric_coeff.ll
+2-2llvm/lib/Analysis/DependenceAnalysis.cpp
+1-1llvm/test/Analysis/DependenceAnalysis/WeakZeroSrcSIV.ll
+1-1llvm/test/Analysis/DependenceAnalysis/WeakZeroDstSIV.ll
+8-144 files

LLVM/project d17c5f9llvm/test/CodeGen/AArch64 clmul-fixed.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll clmul-sdnode.ll

Rebase and address comments

Created using spr 1.3.6-beta.1
DeltaFile
+53,024-7,001llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+15,172-1,553llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+6,812-3,080llvm/test/CodeGen/AArch64/clmul-fixed.ll
+6,520-0llvm/test/CodeGen/X86/bit-manip-i512.ll
+3,441-0llvm/test/MC/AMDGPU/gfx13_asm_vflat.s
+3,257-0llvm/test/CodeGen/X86/bit-manip-i256.ll
+88,226-11,6341,391 files not shown
+138,997-27,6991,397 files

LLVM/project 9243a1cllvm/test/CodeGen/AArch64 clmul-fixed.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll clmul-sdnode.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+53,024-7,001llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+15,172-1,553llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+6,812-3,080llvm/test/CodeGen/AArch64/clmul-fixed.ll
+6,520-0llvm/test/CodeGen/X86/bit-manip-i512.ll
+3,441-0llvm/test/MC/AMDGPU/gfx13_asm_vflat.s
+3,257-0llvm/test/CodeGen/X86/bit-manip-i256.ll
+88,226-11,6341,390 files not shown
+138,949-27,6701,396 files

LLVM/project e12bb1allvm/test/CodeGen/RISCV/rvv abd.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.6-beta.1
DeltaFile
+106-0llvm/test/CodeGen/RISCV/rvv/abd.ll
+106-01 files

LLVM/project 2b21231llvm/lib/Transforms/IPO MemProfContextDisambiguation.cpp, llvm/test/ThinLTO/X86 remark-missing-info.ll memprof-basic.ll

[MemProf] Enhance thin link optimization remarks (#184829)

Don't require -memprof-report-hinted-sizes for emitting opt remarks
during the thin link step. Invoke the handling also when opt remarks are
enabled for MemProf per OptimizationRemarkEmitter::allowExtraAnalysis.

Also, add a fallback message if we don't have the context size
information, adding tests for those new messages.

I also realized we don't currently emit these messages for MemProf with
regular LTO, and added a TODO.
DeltaFile
+64-0llvm/test/ThinLTO/X86/remark-missing-info.ll
+59-0llvm/test/Transforms/MemProfContextDisambiguation/remark-missing-info.ll
+44-6llvm/test/ThinLTO/X86/memprof-basic.ll
+29-6llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+7-7llvm/test/Transforms/MemProfContextDisambiguation/inlined3.ll
+3-3llvm/test/Transforms/MemProfContextDisambiguation/basic.ll
+206-223 files not shown
+211-259 files

LLVM/project f31e65flibclc/clc/lib/amdgcn/mem_fence clc_mem_fence.cl, libclc/opencl/lib/generic SOURCES

libclc: Add atomic_work_item_fence (#184844)
DeltaFile
+18-0libclc/opencl/lib/generic/atomic/atomic_work_item_fence.cl
+2-0libclc/clc/lib/amdgcn/mem_fence/clc_mem_fence.cl
+1-0libclc/opencl/lib/generic/SOURCES
+21-03 files

LLVM/project 42e6928llvm/test/CodeGen/AArch64 clmul-fixed.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll clmul-sdnode.ll

Merge branch 'main' into users/kasuga-fj/da-fix-weak-zero-siv
DeltaFile
+53,024-7,001llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+15,172-1,553llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+6,812-3,080llvm/test/CodeGen/AArch64/clmul-fixed.ll
+6,520-0llvm/test/CodeGen/X86/bit-manip-i512.ll
+3,441-0llvm/test/MC/AMDGPU/gfx13_asm_vflat.s
+3,257-0llvm/test/CodeGen/X86/bit-manip-i256.ll
+88,226-11,6341,516 files not shown
+142,295-28,6401,522 files

LLVM/project 1f2e52fllvm/test/tools/llvm-ir2vec/Inputs input.ll, llvm/test/tools/llvm-ir2vec/bindings ir2vec-bindings.py

[llvm-ir2vec] Adding getFuncNames API to ir2vec python bindings (#180473)

This is more a user convenience thing. But I thought it helpful.

Otherwise, at the moment, the user has to fetch the entire embeddings
dict, just to see what all functions a module has
DeltaFile
+11-0llvm/tools/llvm-ir2vec/Bindings/PyIR2Vec.cpp
+11-0llvm/test/tools/llvm-ir2vec/bindings/ir2vec-bindings.py
+3-0llvm/test/tools/llvm-ir2vec/Inputs/input.ll
+25-03 files

LLVM/project cb6936ellvm/test/Transforms/LoopVectorize/AArch64 blend-costs.ll

[LV] Remove branch on false in blend-costs.ll test. NFC (#184816)

I have a patch I want to post that improves blend masks, but it ends up
with a weird diff in this test stemming from the branch on false.

This replaces it with an external boolean. This should still test
scalarizing a blend which I believe is the original intent.
DeltaFile
+49-23llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+49-231 files

LLVM/project 46d29d4lld/ELF Relocations.cpp RelocScan.h

[ELF] Remove unused handleTlsRelocation (#184951)

Now that all targets use target-specific relocation scanning for TLS
(#181332 RISC-V being the last), handleTlsRelocation is unused.
DeltaFile
+0-100lld/ELF/Relocations.cpp
+0-15lld/ELF/RelocScan.h
+2-3lld/ELF/InputSection.cpp
+2-1183 files

LLVM/project 4541e23llvm/lib/Target/RISCV RISCVISelDAGToDAG.cpp RISCVInstrInfoP.td, llvm/test/CodeGen/RISCV rvp-ext-rv64.ll rvp-ext-rv32.ll

[RISCV][P-ext] Select plui.h/w and improve usage of pli.b/h/w. (#184937)

This patch adds custom instruction selection of splat_vector of
constants. Rather that using the element size from the VT, find
the smallest splat size in the constant. This allow us to use
pli.b for i16 or i32 elements that contain a byte splat.
DeltaFile
+82-14llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+41-14llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+32-0llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+0-8llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+155-364 files

LLVM/project 62a5e53mlir/lib/Transforms/Utils DialectConversion.cpp, mlir/test/Transforms test-legalizer.mlir

[mlir] Improve dialect conversion failure diagnostics (#182729)

This PR improves MLIR dialect conversion failure diagnostics when
legalization fails.

Previously, the diagnostic mostly included the operation name (and in
partial conversion, whether it was explicitly marked illegal). This
change keeps that prefix and appends the printed failing operation. This
provides immediate operand/result/type context directly in the same
error line.

### Example

Before:
```
failed to legalize operation 'test.type_consumer' that was explicitly marked illegal
```

After:

    [6 lines not shown]
DeltaFile
+10-4mlir/lib/Transforms/Utils/DialectConversion.cpp
+2-2mlir/test/Transforms/test-legalizer.mlir
+12-62 files