LLVM/project f607c7allvm/test/CodeGen/X86 vector-interleaved-store-i8-stride-6.ll vector-interleaved-store-i16-stride-6.ll

Tests changes after de-stack
DeltaFile
+168-192llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+134-136llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+17-15llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+319-3433 files

LLVM/project c19fa5bllvm/lib/Target/WebAssembly WebAssemblyInstrSIMD.td, llvm/test/CodeGen/WebAssembly fpclamptosat_vec.ll saturating-truncation.ll

[WebAssembly] narrow instructions use signed saturation (#201798)

Fixes https://github.com/llvm/llvm-project/issues/201780

Per
https://www.w3.org/TR/wasm-core-2/#-hrefop-narrowmathrmnarrowmathsfu_m-n-i
the saturation is signed, the truncation is unsigned.
DeltaFile
+123-133llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
+70-6llvm/test/CodeGen/WebAssembly/saturating-truncation.ll
+17-3llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
+210-1423 files

LLVM/project f04b271llvm/include/llvm/ProfileData SampleProfReader.h, llvm/lib/Transforms/IPO SampleProfile.cpp SampleProfileMatcher.cpp

[SampleProfile] Switch getNameTable() to return iterator_range (NFC) (#200995)

This patch teaches SampleProfileReader::getNameTable() to return an
iterator_range instead of a pointer to std::vector<FunctionId>.

This patch is meant to be a preparation patch for the following
speed-up opportunity.  I'm planning to lazy-load SecNameTable in a
subsequent patch for performance reasons.  We have SecNameTable that
takes up about 90MB on disk.  We eager-load this section into
std::vector<FunctionId> on the heap.  This ends up taking about 180MB
on the heap because the element type of the section is 8-byte MD5 hash
value while FunctionId takes up 16 bytes.  This eager loading shows up
on the execution profile -- about 1%.  Since we do have a few places
where we scan the entire NameTable, we should accommodate those places
with iterators that lazy-load SecNameTable.

See the RFC at:

https://discourse.llvm.org/t/rfc-faster-sample-profile-loading/90957
DeltaFile
+30-3llvm/include/llvm/ProfileData/SampleProfReader.h
+7-8llvm/lib/Transforms/IPO/SampleProfile.cpp
+5-7llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+42-183 files

LLVM/project 427d632utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes 681fc74 (#201894)

This fixes 681fc74ac47eaa597d22506231a347748dda635b.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+36-8utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+36-81 files

LLVM/project a4e48b5llvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion cannot_fuse.ll

[LoopFusion] Simplify the logic of checking trip count equality (NFCI). (#201446)

Currently `haveIdenticalTripCounts` has a clunky return value, which
makes it very easy to make a mistake. The returned pair doesn't provide
much value and can be replaced with an optional integer. Also the
function `haveIdenticalTripCounts` does more than what its name
suggests. It checks whether peeling is supported for the pair of loops
or not. Interestingly this is not the only place where we check whether
peeling for this pair is supported!

This patch changes the function and renames it to
`calculateTripCountDiff`. It does exactly what the names says. It tries
to calculate the difference of the trip counts of the two loops and if
it fails it returns an empty optional. It is up to the caller to decide
whether it wants to do fusion/peeling based on this result. The patch
changes some debug output but no functional change is intended.

Datatypes has been modified with explicit specification of size and
signedness to avoid any bug due to overflow in subtraction or comparison
of different integer types.
DeltaFile
+32-60llvm/lib/Transforms/Scalar/LoopFuse.cpp
+2-1llvm/test/Transforms/LoopFusion/cannot_fuse.ll
+34-612 files

LLVM/project 9ee8114llvm/test/CodeGen/AMDGPU dagcombine-freeze-extract-subvector-loop.ll

Remove regression test that's been put elsewhere
DeltaFile
+0-42llvm/test/CodeGen/AMDGPU/dagcombine-freeze-extract-subvector-loop.ll
+0-421 files

LLVM/project 2ba429cllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AMDGPU dagcombine-freeze-extract-subvector-loop.ll

Style, named test vars
DeltaFile
+25-28llvm/test/CodeGen/AMDGPU/dagcombine-freeze-extract-subvector-loop.ll
+1-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+26-302 files

LLVM/project 2433b06llvm/lib/CodeGen/SelectionDAG LegalizeVectorOps.cpp, llvm/lib/Target/RISCV RISCVInstrInfoP.td RISCVISelLowering.cpp

[RISCV][TargetLowering][P-ext] Support sext_inreg or v2i32/v4i16 vectors on RV32. (#201752)

Update sext_vector_inreg expansion to use sext_inreg. Previously it
emitted 2 shifts that wouldn't be combined.
DeltaFile
+17-25llvm/test/CodeGen/RISCV/rvp-simd-64.ll
+6-9llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+2-4llvm/test/CodeGen/RISCV/rvp-simd-32.ll
+5-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+4-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+34-385 files

LLVM/project 1bdd78dllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AMDGPU dagcombine-freeze-extract-subvector-loop.ll

[SelectionDAG] Fold extracts of subvector inserts

Fold extract_subvector(insert_subvector(...)) when the extraction is
outside the inserted subvector or the inserted subvector only amends
the extracted

In particular,
1. vA extract_subvector (vB insert_subvector(vB X, vC Y, C1), C2) =>
vA extract_subvector(X, C2) when [C2, C2 + A) intersect [C1, C1 + C)
is the empty set
2. ... => extract_subvector(Y, C2 - C1) if [C2, C2 + Y) is a subset of
[C1, C1 + C) - an existing simplification
3. ... => vA insert_subvector(vA extract_subvector(vB X, C2), vC Y, C1 - C2)
if [C1, C1 + C) is a subset of [C2, C2 + A) - that is, if you're only
updating the extracted sub-part.

Adds a regresssion tests for an infinite SelectionDAG cycle that is
fixed by a stack of commits that ends with this one.


    [3 lines not shown]
DeltaFile
+192-176llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+136-138llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+72-56llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll
+45-0llvm/test/CodeGen/AMDGPU/dagcombine-freeze-extract-subvector-loop.ll
+28-7llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+4-8llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-3.ll
+477-3856 files

LLVM/project d4ec02ellvm/lib/Target/RISCV RISCVInstrInfoP.td RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV rvp-simd-64.ll rvp-simd-32.ll

[RISCV][P-ext] Support v4i16/v2i32->v4i8/v2i16 truncate. (#201757)
DeltaFile
+16-48llvm/test/CodeGen/RISCV/rvp-simd-64.ll
+2-8llvm/test/CodeGen/RISCV/rvp-simd-32.ll
+3-6llvm/test/CodeGen/RISCV/rvp-narrowing-shift-trunc.ll
+5-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+1-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+27-625 files

LLVM/project cc83c1fllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512bwvl.ll

[SelectionDAG] Fold subvector inserts into concat operands

Push insert_subvector into the containing CONCAT_VECTORS operand when the insertion is wholly contained there.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+8-36llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
+34-10llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+42-462 files

LLVM/project 107f263llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 sve-load-store-legalisation.ll

scalable vector test updates
DeltaFile
+772-772llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+4-20llvm/test/CodeGen/AArch64/sve-load-store-legalisation.ll
+4-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+780-7943 files

LLVM/project e14735cllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

Make this work on scalable vectors
DeltaFile
+12-15llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+12-151 files

LLVM/project 430b2ballvm/test/CodeGen/AArch64 sve-fixed-vector-llrint.ll sve-fixed-vector-lrint.ll, llvm/test/CodeGen/AMDGPU bf16.ll

[SelectionDAG] Fold extracts spanning concat operands

Factor the extract_subvector-of-CONCAT_VECTORS logic and handle
extracts that cover multiple whole concat operands by rebuilding a
smaller concat directly.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+992-904llvm/test/CodeGen/AMDGPU/bf16.ll
+187-229llvm/test/CodeGen/AArch64/sve-fixed-vector-llrint.ll
+187-229llvm/test/CodeGen/AArch64/sve-fixed-vector-lrint.ll
+196-176llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+142-140llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+120-120llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll
+1,824-1,79811 files not shown
+2,204-2,27917 files

LLVM/project b8cb361llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[SelectionDAG] Fold nonzero extract-of-extract indices

Generalize the extract_subvector-of-extract_subvector fold to compose
nonzero indices instead of only handling an outer index of zero.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+8-8llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+8-81 files

LLVM/project 7b907d5clang/lib/Driver/ToolChains Flang.cpp Flang.h

clang/AMDGPU: Pass BoundArch through device libs handling

Pre-work to consolidate target identification for future target
option bug fixes. Also requires updating flang to match recent
clang changes.

Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
DeltaFile
+14-10clang/lib/Driver/ToolChains/Flang.cpp
+13-3clang/lib/Driver/ToolChains/Flang.h
+9-6clang/lib/Driver/ToolChains/HIPAMD.cpp
+9-5clang/lib/Driver/ToolChains/AMDGPU.cpp
+5-8clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+2-2clang/lib/Driver/ToolChains/HIPSPV.cpp
+52-345 files not shown
+57-3911 files

LLVM/project 232cfb2clang/lib/Driver/ToolChains Darwin.cpp AMDGPU.cpp

clang: Add BoundArch argument to addClangTargetOptions

addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.

The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.

Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
DeltaFile
+9-5clang/lib/Driver/ToolChains/Darwin.cpp
+7-5clang/lib/Driver/ToolChains/AMDGPU.cpp
+7-3clang/lib/Driver/ToolChains/Darwin.h
+6-3clang/lib/Driver/ToolChains/AMDGPU.h
+5-3clang/lib/Driver/ToolChains/Hexagon.h
+5-3clang/lib/Driver/ToolChains/XCore.h
+39-2248 files not shown
+115-5654 files

LLVM/project 05d4fd0clang/include/clang/Basic BuiltinsAMDGPUDocs.td

[NFC][Doc] Fix non-existing reference in BuiltinsAMDGPUDocs.td (#201889)
DeltaFile
+2-2clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+2-21 files

LLVM/project 4a80006llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review feedback I forgot to push lol
DeltaFile
+2-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+2-21 files

LLVM/project 0f19139llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review feedback
DeltaFile
+4-8llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-81 files

LLVM/project 6e46d9ellvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 freeze-vector.ll

[SelectionDAG] Track demanded select elements in noundef checks

Propagate demanded elements through to the two arms of a select, and
check the condition with or without demanded elements depending on if
it's a vector or not.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+17-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+0-11llvm/test/CodeGen/X86/freeze-vector.ll
+17-132 files

LLVM/project 797eef5llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review style etc.
DeltaFile
+9-10llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+9-101 files

LLVM/project 98c9c85llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 freeze-vector.ll

[SelectionDAG] Track bitcast demanded elements in noundef tests

Bitcasts preserve undef/poison status, but vector bitcasts can change
which source lanes cover a demanded result lane. Map the demanded
element mask through fixed-length vector bitcasts before checking the
source where possible.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+12-36llvm/test/CodeGen/X86/freeze-vector.ll
+41-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+53-362 files

LLVM/project 44ef831clang/include/clang/AST TypeBase.h, clang/lib/AST AttrImpl.cpp

[clang][AST] Hash `AttributedType`'s `Attr` by Arguments (#200961)

https://github.com/llvm/llvm-project/pull/108631 added
`ID.AddPointer(attr)` to `AttributedType::Profile`, which turned the
`ID` into a pointer-identity key. This inhibits deduplication of
attributed types (such as types with `_Nonnull/_Nullable` attributes).
Such duplications can lead to significant increases in pcm/pch sizes.

This PR adds the arguments of the attributes to the folding set ID, so
that the content of the argument is taken into account when computing
the ID in addition to the existing inputs. The implementation teaches
tablegen to generate the `profile` method for each attribute, similar to
how we generate methods to check equivalence. This way, the argument
contents are handled automatically. Additionally, an attribute can have
an escape hatch to add its own customized profile method, through the
`profileFn` tablegen field, in case something special is needed.

Assisted-by: claude-opus-4.7

Fixes rdar://170586474.
DeltaFile
+84-0clang/lib/AST/AttrImpl.cpp
+43-0clang/utils/TableGen/ClangAttrEmitter.cpp
+18-0clang/test/AST/attributed-type-dedup-nullability.m
+15-0clang/test/AST/attributed-type-dedup-swift-attr.m
+15-0clang/test/AST/attributed-type-dedup-pcs.c
+5-10clang/include/clang/AST/TypeBase.h
+180-109 files not shown
+252-1615 files

LLVM/project 31b57b0llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 freeze-vector.ll

[SelectionDAG] Track demanded concat elements in noundef checks

Teach isGuaranteedNotToBeUndefOrPoison to distribute fixed-length
demanded element masks across CONCAT_VECTORS operands. This is part of
the series of fixes needed to resolve a SelectionDAG hang by making it
possible to prove certain values don't need to be frozen.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+23-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-12llvm/test/CodeGen/X86/freeze-vector.ll
+27-122 files

LLVM/project 56a4aa0llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review comments
DeltaFile
+4-9llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-91 files

LLVM/project b764858flang/lib/Lower/OpenMP DataSharingProcessor.h, flang/lib/Lower/Support Utils.cpp PrivateReductionUtils.cpp

[Flang][OpenMP] Heap-allocate GPU dynamic private arrays in distribute parallel do (#200841)

Fixes GPU offload crashes for Fortran automatic arrays privatised in
target teams distribute parallel do.

For delayed privatisation on GPU, dynamically sized boxed array privates
are now routed through the existing heap-allocation path, with matching
cleanup emitted in the privatiser dealloc region. This avoids lowering
such arrays to runtime-sized scratch allocas whose descriptors can be
captured across the distribute callback boundary.

Fixes [#2419](https://github.com/ROCm/llvm-project/issues/2419).
Co-authored-by: Codex <codex at openai.com>
DeltaFile
+54-0flang/test/Lower/OpenMP/DelayedPrivatization/target-teams-distribute-private-adjustable-array.f90
+38-7flang/lib/Lower/Support/Utils.cpp
+33-0flang/test/Lower/OpenMP/DelayedPrivatization/target-teams-nested-distribute-private-adjustable-array.f90
+32-0flang/test/Lower/OpenMP/DelayedPrivatization/target-teams-distribute-parallel-do-simd-private-adjustable-array.f90
+9-5flang/lib/Lower/Support/PrivateReductionUtils.cpp
+5-0flang/lib/Lower/OpenMP/DataSharingProcessor.h
+171-124 files not shown
+181-1510 files

LLVM/project 681fc74mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td ROCDLEnums.td, mlir/lib/Dialect/LLVMIR/IR ROCDLDialect.cpp

[mlir][ROCDL] Move ROCDL intrinsic enum immargs to enums (#198875)

In many cases, a "i32" `immarg` arguhment to an intrinsic in the AMDGPU
backend actually corresponds directly to some enumerated set of values
in the backend, which we have to smuggle through an I32. This makes the
MLIR forms of intrinsics less readable and means that people either have
to use the `amdgpu` dialect to get these enums or have to roll their own
enums if they want to know what's going on.

This PR rips the band-aid off and breaks the world by swapping out those
integer attributes for enum attributes.

Of special note is the handling of the aux/cachepolicy field on various
intrinsics; in the backend, all the architectures share an enum and
you've just got to use the right names in the right spots. Here, we've
separated out the cases for pre-gfx942, gfx942+, and gfx12 enums as
separate attributes (including separate casing for gfx12 atomics) and
allowed any of them to be used. We also allow an I32Attr in those
arguments for easy importing and to make the common case of "0" portably

    [18 lines not shown]
DeltaFile
+201-278mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+198-206mlir/test/Target/LLVMIR/rocdl.mlir
+192-165mlir/test/Dialect/LLVMIR/rocdl.mlir
+282-0mlir/include/mlir/Dialect/LLVMIR/ROCDLEnums.td
+85-151mlir/lib/Dialect/LLVMIR/IR/ROCDLDialect.cpp
+155-0mlir/include/mlir/Dialect/LLVMIR/ROCDLDialect.td
+1,113-80023 files not shown
+1,495-1,00929 files

LLVM/project e668f64clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

Revert "[clang] Reland: fix getTemplateInstantiationArgs" (#201864)

Reverts llvm/llvm-project#201373

This caused compilation errors. See comment on the original PR.
DeltaFile
+429-194clang/lib/Sema/SemaTemplateInstantiate.cpp
+165-275clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+146-150clang/lib/Sema/SemaTemplate.cpp
+95-96clang/include/clang/AST/DeclTemplate.h
+129-59clang/lib/Sema/SemaConcept.cpp
+92-60clang/lib/AST/DeclTemplate.cpp
+1,056-83455 files not shown
+1,715-1,49361 files

LLVM/project 652af90clang/include/clang/Basic BuiltinsAMDGPUDocs.td

[NFC][Doc] Fix non-existing reference in BuiltinsAMDGPUDocs.td
DeltaFile
+2-2clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+2-21 files