LLVM/project cc83c1fllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512bwvl.ll

[SelectionDAG] Fold subvector inserts into concat operands

Push insert_subvector into the containing CONCAT_VECTORS operand when the insertion is wholly contained there.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+8-36llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
+34-10llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+42-462 files

LLVM/project 107f263llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 sve-load-store-legalisation.ll

scalable vector test updates
DeltaFile
+772-772llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+4-20llvm/test/CodeGen/AArch64/sve-load-store-legalisation.ll
+4-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+780-7943 files

LLVM/project e14735cllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

Make this work on scalable vectors
DeltaFile
+12-15llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+12-151 files

LLVM/project 430b2ballvm/test/CodeGen/AArch64 sve-fixed-vector-llrint.ll sve-fixed-vector-lrint.ll, llvm/test/CodeGen/AMDGPU bf16.ll

[SelectionDAG] Fold extracts spanning concat operands

Factor the extract_subvector-of-CONCAT_VECTORS logic and handle
extracts that cover multiple whole concat operands by rebuilding a
smaller concat directly.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+992-904llvm/test/CodeGen/AMDGPU/bf16.ll
+187-229llvm/test/CodeGen/AArch64/sve-fixed-vector-llrint.ll
+187-229llvm/test/CodeGen/AArch64/sve-fixed-vector-lrint.ll
+196-176llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+142-140llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+120-120llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll
+1,824-1,79811 files not shown
+2,204-2,27917 files

LLVM/project b8cb361llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[SelectionDAG] Fold nonzero extract-of-extract indices

Generalize the extract_subvector-of-extract_subvector fold to compose
nonzero indices instead of only handling an outer index of zero.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+8-8llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+8-81 files

LLVM/project 7b907d5clang/lib/Driver/ToolChains Flang.cpp Flang.h

clang/AMDGPU: Pass BoundArch through device libs handling

Pre-work to consolidate target identification for future target
option bug fixes. Also requires updating flang to match recent
clang changes.

Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
DeltaFile
+14-10clang/lib/Driver/ToolChains/Flang.cpp
+13-3clang/lib/Driver/ToolChains/Flang.h
+9-6clang/lib/Driver/ToolChains/HIPAMD.cpp
+9-5clang/lib/Driver/ToolChains/AMDGPU.cpp
+5-8clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+2-2clang/lib/Driver/ToolChains/HIPSPV.cpp
+52-345 files not shown
+57-3911 files

LLVM/project 232cfb2clang/lib/Driver/ToolChains Darwin.cpp AMDGPU.cpp

clang: Add BoundArch argument to addClangTargetOptions

addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.

The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.

Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
DeltaFile
+9-5clang/lib/Driver/ToolChains/Darwin.cpp
+7-5clang/lib/Driver/ToolChains/AMDGPU.cpp
+7-3clang/lib/Driver/ToolChains/Darwin.h
+6-3clang/lib/Driver/ToolChains/AMDGPU.h
+5-3clang/lib/Driver/ToolChains/Hexagon.h
+5-3clang/lib/Driver/ToolChains/XCore.h
+39-2248 files not shown
+115-5654 files

LLVM/project 05d4fd0clang/include/clang/Basic BuiltinsAMDGPUDocs.td

[NFC][Doc] Fix non-existing reference in BuiltinsAMDGPUDocs.td (#201889)
DeltaFile
+2-2clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+2-21 files

LLVM/project 4a80006llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review feedback I forgot to push lol
DeltaFile
+2-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+2-21 files

LLVM/project 0f19139llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review feedback
DeltaFile
+4-8llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-81 files

LLVM/project 6e46d9ellvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 freeze-vector.ll

[SelectionDAG] Track demanded select elements in noundef checks

Propagate demanded elements through to the two arms of a select, and
check the condition with or without demanded elements depending on if
it's a vector or not.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+17-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+0-11llvm/test/CodeGen/X86/freeze-vector.ll
+17-132 files

LLVM/project 797eef5llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review style etc.
DeltaFile
+9-10llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+9-101 files

LLVM/project 98c9c85llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 freeze-vector.ll

[SelectionDAG] Track bitcast demanded elements in noundef tests

Bitcasts preserve undef/poison status, but vector bitcasts can change
which source lanes cover a demanded result lane. Map the demanded
element mask through fixed-length vector bitcasts before checking the
source where possible.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+12-36llvm/test/CodeGen/X86/freeze-vector.ll
+41-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+53-362 files

LLVM/project 44ef831clang/include/clang/AST TypeBase.h, clang/lib/AST AttrImpl.cpp

[clang][AST] Hash `AttributedType`'s `Attr` by Arguments (#200961)

https://github.com/llvm/llvm-project/pull/108631 added
`ID.AddPointer(attr)` to `AttributedType::Profile`, which turned the
`ID` into a pointer-identity key. This inhibits deduplication of
attributed types (such as types with `_Nonnull/_Nullable` attributes).
Such duplications can lead to significant increases in pcm/pch sizes.

This PR adds the arguments of the attributes to the folding set ID, so
that the content of the argument is taken into account when computing
the ID in addition to the existing inputs. The implementation teaches
tablegen to generate the `profile` method for each attribute, similar to
how we generate methods to check equivalence. This way, the argument
contents are handled automatically. Additionally, an attribute can have
an escape hatch to add its own customized profile method, through the
`profileFn` tablegen field, in case something special is needed.

Assisted-by: claude-opus-4.7

Fixes rdar://170586474.
DeltaFile
+84-0clang/lib/AST/AttrImpl.cpp
+43-0clang/utils/TableGen/ClangAttrEmitter.cpp
+18-0clang/test/AST/attributed-type-dedup-nullability.m
+15-0clang/test/AST/attributed-type-dedup-swift-attr.m
+15-0clang/test/AST/attributed-type-dedup-pcs.c
+5-10clang/include/clang/AST/TypeBase.h
+180-109 files not shown
+252-1615 files

LLVM/project 31b57b0llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 freeze-vector.ll

[SelectionDAG] Track demanded concat elements in noundef checks

Teach isGuaranteedNotToBeUndefOrPoison to distribute fixed-length
demanded element masks across CONCAT_VECTORS operands. This is part of
the series of fixes needed to resolve a SelectionDAG hang by making it
possible to prove certain values don't need to be frozen.

AI note: an LLM generated the code and the test, I've read them

Co-Authored-By: OpenAI Codex <codex at openai.com>
DeltaFile
+23-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-12llvm/test/CodeGen/X86/freeze-vector.ll
+27-122 files

LLVM/project 56a4aa0llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Review comments
DeltaFile
+4-9llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-91 files

LLVM/project b764858flang/lib/Lower/OpenMP DataSharingProcessor.h, flang/lib/Lower/Support Utils.cpp PrivateReductionUtils.cpp

[Flang][OpenMP] Heap-allocate GPU dynamic private arrays in distribute parallel do (#200841)

Fixes GPU offload crashes for Fortran automatic arrays privatised in
target teams distribute parallel do.

For delayed privatisation on GPU, dynamically sized boxed array privates
are now routed through the existing heap-allocation path, with matching
cleanup emitted in the privatiser dealloc region. This avoids lowering
such arrays to runtime-sized scratch allocas whose descriptors can be
captured across the distribute callback boundary.

Fixes [#2419](https://github.com/ROCm/llvm-project/issues/2419).
Co-authored-by: Codex <codex at openai.com>
DeltaFile
+54-0flang/test/Lower/OpenMP/DelayedPrivatization/target-teams-distribute-private-adjustable-array.f90
+38-7flang/lib/Lower/Support/Utils.cpp
+33-0flang/test/Lower/OpenMP/DelayedPrivatization/target-teams-nested-distribute-private-adjustable-array.f90
+32-0flang/test/Lower/OpenMP/DelayedPrivatization/target-teams-distribute-parallel-do-simd-private-adjustable-array.f90
+9-5flang/lib/Lower/Support/PrivateReductionUtils.cpp
+5-0flang/lib/Lower/OpenMP/DataSharingProcessor.h
+171-124 files not shown
+181-1510 files

LLVM/project 681fc74mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td ROCDLEnums.td, mlir/lib/Dialect/LLVMIR/IR ROCDLDialect.cpp

[mlir][ROCDL] Move ROCDL intrinsic enum immargs to enums (#198875)

In many cases, a "i32" `immarg` arguhment to an intrinsic in the AMDGPU
backend actually corresponds directly to some enumerated set of values
in the backend, which we have to smuggle through an I32. This makes the
MLIR forms of intrinsics less readable and means that people either have
to use the `amdgpu` dialect to get these enums or have to roll their own
enums if they want to know what's going on.

This PR rips the band-aid off and breaks the world by swapping out those
integer attributes for enum attributes.

Of special note is the handling of the aux/cachepolicy field on various
intrinsics; in the backend, all the architectures share an enum and
you've just got to use the right names in the right spots. Here, we've
separated out the cases for pre-gfx942, gfx942+, and gfx12 enums as
separate attributes (including separate casing for gfx12 atomics) and
allowed any of them to be used. We also allow an I32Attr in those
arguments for easy importing and to make the common case of "0" portably

    [18 lines not shown]
DeltaFile
+201-278mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+198-206mlir/test/Target/LLVMIR/rocdl.mlir
+192-165mlir/test/Dialect/LLVMIR/rocdl.mlir
+282-0mlir/include/mlir/Dialect/LLVMIR/ROCDLEnums.td
+85-151mlir/lib/Dialect/LLVMIR/IR/ROCDLDialect.cpp
+155-0mlir/include/mlir/Dialect/LLVMIR/ROCDLDialect.td
+1,113-80023 files not shown
+1,495-1,00929 files

LLVM/project e668f64clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

Revert "[clang] Reland: fix getTemplateInstantiationArgs" (#201864)

Reverts llvm/llvm-project#201373

This caused compilation errors. See comment on the original PR.
DeltaFile
+429-194clang/lib/Sema/SemaTemplateInstantiate.cpp
+165-275clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+146-150clang/lib/Sema/SemaTemplate.cpp
+95-96clang/include/clang/AST/DeclTemplate.h
+129-59clang/lib/Sema/SemaConcept.cpp
+92-60clang/lib/AST/DeclTemplate.cpp
+1,056-83455 files not shown
+1,715-1,49361 files

LLVM/project 652af90clang/include/clang/Basic BuiltinsAMDGPUDocs.td

[NFC][Doc] Fix non-existing reference in BuiltinsAMDGPUDocs.td
DeltaFile
+2-2clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+2-21 files

LLVM/project 7839f1flld/wasm InputFiles.cpp

[lld][WebAssembly] Add missing space in unmodeled diagnostic (#201764)

This is just a nit change, I hit this fatal while trying to use a GC
object, and noticed that the diagnostic showed `foo.ofile has unmodeled
reference or GC types`
DeltaFile
+1-1lld/wasm/InputFiles.cpp
+1-11 files

LLVM/project 96f3f0aclang/lib/Driver Driver.cpp

clang: Remove use of auto which may have been a triple copy (#201880)
DeltaFile
+2-2clang/lib/Driver/Driver.cpp
+2-21 files

LLVM/project 559ea91clang/test/Driver objc-constant-literals.m

[Driver][test] Use -### for non-ObjC constant-literal RUN lines (#201877)

The RUN lines added in 3b100666a70f did a real compile for
arm64-apple-macosx11, which fails on builders that don't register the
AArch64 backend (e.g. llvm-clang-x86_64-sie-ubuntu-fast). The
NoArgumentUnused behavior under test is driver-side, so switch to -###
and avoid the backend dependency.
DeltaFile
+6-6clang/test/Driver/objc-constant-literals.m
+6-61 files

LLVM/project 92a5784flang/test/Integration/OpenMP atomic-compare.f90, flang/test/Lower/OpenMP atomic-compare.f90

[flang][OpenMP] Adding support for weak extended-atomic clause (#201823)

Adding support for "!$omp atomic compare weak".

!$omp atomic compare weak
if (var1 == num1) var1 = num2
!$omp end atomic

This also Fixes
[#201812](https://github.com/llvm/llvm-project/issues/201812)

---------

Co-authored-by: Sunil Kuravinakop <kuravina at pe31.hpc.amslabs.hpecorp.net>
DeltaFile
+36-0mlir/test/Dialect/OpenMP/ops.mlir
+27-0mlir/test/Target/LLVMIR/openmp-llvm.mlir
+21-0flang/test/Lower/OpenMP/atomic-compare.f90
+16-0flang/test/Parser/OpenMP/atomic-unparse.f90
+12-0flang/test/Integration/OpenMP/atomic-compare.f90
+6-6llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+118-64 files not shown
+138-1510 files

LLVM/project 581ee88llvm/lib/IR Intrinsics.cpp, llvm/test/Assembler invalid-vecreduce.ll invalid-interleave.ll

[LLVM] Precise error message for intrinsic signature verification (3/n) (#200493)

Print precise error message for dependent types when an intrinsic's type
signature verification fails.
DeltaFile
+135-86llvm/lib/IR/Intrinsics.cpp
+187-0llvm/test/Verifier/intrinsic-bad-arg-type1.ll
+14-10llvm/test/Verifier/scatter_gather.ll
+4-4llvm/test/Assembler/invalid-vecreduce.ll
+2-2llvm/test/Verifier/reduction-intrinsics.ll
+1-1llvm/test/Assembler/invalid-interleave.ll
+343-1031 files not shown
+344-1047 files

LLVM/project b16873bclang/lib/Headers __clang_hip_libdevice_declares.h, clang/test/Headers openmp-device-functions-bool.c __clang_hip_libdevice_declares.cpp

clang/HIP: Remove __ockl_fdot2 declaration (#201878)

The builtin headers should not be in the business of exporting
ockl functions, and only declaring the minimum which are actively
used by the builtin headers.
DeltaFile
+20-67clang/test/Headers/openmp-device-functions-bool.c
+0-49clang/test/Headers/__clang_hip_libdevice_declares.cpp
+0-9clang/lib/Headers/__clang_hip_libdevice_declares.h
+20-1253 files

LLVM/project b235617lldb/source/Plugins/Process/Windows/Common ProcessWindows.cpp IOHandlerProcessSTDIOWindows.cpp, llvm/lib/Target/SPIRV SPIRVNonSemanticDebugHandler.cpp SPIRVNonSemanticDebugHandler.h

Merge branch 'main' into revert-201373-users/mizvekov/get-template-inst-args
DeltaFile
+290-65llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.cpp
+11-179lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+172-0lldb/source/Plugins/Process/Windows/Common/IOHandlerProcessSTDIOWindows.cpp
+96-28llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.h
+32-32llvm/test/CodeGen/AArch64/trunc-to-tbl.ll
+63-0lldb/source/Plugins/Process/Windows/Common/IOHandlerProcessSTDIOWindows.h
+664-30431 files not shown
+1,150-36737 files

LLVM/project 7b912b6clang/lib/Driver Driver.cpp

clang: Remove use of auto which may have been a triple copy
DeltaFile
+2-2clang/lib/Driver/Driver.cpp
+2-21 files

LLVM/project 3691cf9llvm/test/CodeGen/AArch64 trunc-to-tbl.ll fp-conversion-to-tbl.ll, llvm/test/CodeGen/X86 mbp-false-cfg-break.ll

[Test] Fix loop exit conditions to prevent trivial optimizations (#201867)

Several tests had 'br i1 %ec, label %loop, label %exit' which exits on
the first iteration instead of looping so I swapped them. Also changed
predicates to keep the loops, otherwise they are going to be eliminated
by https://github.com/llvm/llvm-project/pull/201839.
DeltaFile
+32-32llvm/test/CodeGen/AArch64/trunc-to-tbl.ll
+18-18llvm/test/CodeGen/AArch64/fp-conversion-to-tbl.ll
+8-8llvm/test/CodeGen/AArch64/sitofp-to-tbl.ll
+4-5llvm/test/CodeGen/AArch64/pr164181.ll
+2-2llvm/test/Transforms/LoopStrengthReduce/X86/pr62660-normalization-failure.ll
+1-1llvm/test/CodeGen/X86/mbp-false-cfg-break.ll
+65-666 files

LLVM/project 89f4b84llvm/lib/Transforms/InstCombine InstCombineLoadStoreAlloca.cpp, llvm/test/Transforms/InstCombine ptr-replace-alloca.ll

[InstCombine] Use copyMetadata in PointerReplacer::replace (#201827)

PointerReplacer::replace creates a new load that differs from the
original only in its pointer operand; the loaded type is unchanged.  It
was using copyMetadataForLoad(), which is meant for the case where the
load's *type* changes.  Since the type is the same here, plain
copyMetadata() is correct and preserves all metadata directly.
DeltaFile
+20-0llvm/test/Transforms/InstCombine/ptr-replace-alloca.ll
+1-1llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+21-12 files