LLVM/project fff2f0bllvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU wmma-coexecution-valu-hazards.mir

[AMDGPU] Handle GFX1250 hazards between WMMA and VOPD (#183573)

Hazards between WMMA and VALU were handled in #149865 but this only
worked for regular VOP* VALU encodings, not for VOPD.

Fixes: #183546
DeltaFile
+30-0llvm/test/CodeGen/AMDGPU/wmma-coexecution-valu-hazards.mir
+11-14llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+41-142 files

LLVM/project c357171clang/docs AddressSanitizer.rst

reword

Created using spr 1.3.7
DeltaFile
+3-3clang/docs/AddressSanitizer.rst
+3-31 files

LLVM/project fc153b1clang/lib/StaticAnalyzer/Checkers/WebKit PtrTypesSemantics.cpp, clang/test/Analysis/Checkers/WebKit nodelete-annotation.cpp

[alpha.webkit.NoDeleteChecker] Check if each field is trivially destructive (#183711)

This PR fixes the bug that NoDeleteChecker and trivial function analysis
were not detecting any non-trivial destruction of class member
variables.

When evaluating a delete expression or calling a destructor directly for
triviality, check if each field in the class and its base classes is
trivially destructive.
DeltaFile
+150-2clang/test/Analysis/Checkers/WebKit/nodelete-annotation.cpp
+46-0clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
+196-22 files

LLVM/project ca04a70libc/shared/math bf16subf128.h, libc/src/__support/math bf16subf128.h CMakeLists.txt

[libc][math] Refactor bf16sub family to header-only (#182115)

Refactors the bf16sub math family to be header-only.

Closes https://github.com/llvm/llvm-project/issues/182114

Target Functions:
  - bf16sub
  - bf16subf
  - bf16subf128
DeltaFile
+46-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+33-0libc/src/__support/math/bf16subf128.h
+31-0libc/src/__support/math/CMakeLists.txt
+28-0libc/shared/math/bf16subf128.h
+27-0libc/src/__support/math/bf16subf.h
+27-0libc/src/__support/math/bf16sub.h
+192-09 files not shown
+255-3015 files

LLVM/project 6f612cfclang/lib/Sema SemaOverload.cpp, clang/test/SemaTemplate temp_arg_nontype_cxx11.cpp

[clang] stop error recovery in SFINAE for narrowing in converted constant expressions (#183614)

A narrowing conversion in a converted constant expression should produce
an invalid expression so that [temp.deduct.general]p7 is satisfied, by
stopping substitution at this point.

This regression was introduced in #164703, and this will be backported
to clang-22, so no release notes.

Fixes #167709
DeltaFile
+10-1clang/test/SemaTemplate/temp_arg_nontype_cxx11.cpp
+8-0clang/lib/Sema/SemaOverload.cpp
+18-12 files

LLVM/project d1f4f94flang/include/flang/Semantics expression.h, flang/lib/Semantics expression.cpp

[flang] Fix explanatory messages for generic resolution error (#183565)

The compiler emits messages to explain why each of a generic procedure's
specific procedures is not a match for a given set of actual arguments.
In the case of specific procedures with PASS arguments in derived type
procedure bindings or procedure components, these explanatory messages
are often bogus, because the re-analysis didn't adjust the actual
arguments to account for the PASS argument. Fix.
DeltaFile
+24-0flang/test/Semantics/bug2295.f90
+16-6flang/lib/Semantics/expression.cpp
+2-1flang/include/flang/Semantics/expression.h
+42-73 files

LLVM/project 4f05592clang/test/Driver sycl-offload-jit-xarch.cpp

[Driver][SYCL] Add tests for -Xarch_<arch> option forwarding to SYCL JIT compilation. (#178025)

This change adds test coverage to verify that options passed via
`-Xarch_<arch> <option>` are correctly forwarded to SYCL JIT
compilations.
DeltaFile
+13-0clang/test/Driver/sycl-offload-jit-xarch.cpp
+13-01 files

LLVM/project 3d889c4clang/lib/Format TokenAnnotator.cpp, clang/unittests/Format FormatTest.cpp

[clang-format] Fix SpaceBeforeParens with explicit template instantiations (#183183)

This fixes explicit template instantiated functions not having spaces
added/removed based on the value of `SpaceBeforeParens`.

Attribution Note - I have been authorized to contribute this change on
behalf of my company: ArenaNet LLC
DeltaFile
+10-9clang/lib/Format/TokenAnnotator.cpp
+4-0clang/unittests/Format/FormatTest.cpp
+14-92 files

LLVM/project 6456216llvm/include/llvm/Transforms/Utils MemoryTaggingSupport.h

cleanup

Created using spr 1.3.7
DeltaFile
+0-4llvm/include/llvm/Transforms/Utils/MemoryTaggingSupport.h
+0-41 files

LLVM/project 6dd1dc8llvm/lib/Transforms/Utils MemoryTaggingSupport.cpp, llvm/test/CodeGen/AArch64 stack-tagging.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+295-0llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+30-0llvm/test/CodeGen/AArch64/stack-tagging.ll
+11-16llvm/lib/Transforms/Utils/MemoryTaggingSupport.cpp
+336-163 files

LLVM/project ae6bbfcllvm/test/CodeGen/AArch64 stack-tagging.ll, llvm/test/Instrumentation/HWAddressSanitizer use-after-scope.ll

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+328-0llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+29-0llvm/test/CodeGen/AArch64/stack-tagging.ll
+357-02 files

LLVM/project df5bee6clang/lib/CIR/Dialect/Transforms FlattenCFG.cpp, clang/test/CIR/Transforms flatten-try-op.cir flatten-cleanup-scope-nyi.cir

[CIR] Implement TryOp flattening (#183591)

This updates the FlattenCFG pass to add flattening for cir::TryOp in
cases where the TryOp contains catch or unwind handlers.

Substantial amounts of this PR were created using agentic AI tools, but
I have carefully reviewed the code, comments, and tests and made changes
as needed. I've left intermediate commits in the initial PR if you'd
like to see the progression.
DeltaFile
+737-0clang/test/CIR/Transforms/flatten-try-op.cir
+287-160clang/lib/CIR/Dialect/Transforms/FlattenCFG.cpp
+0-90clang/test/CIR/Transforms/flatten-cleanup-scope-nyi.cir
+1,024-2503 files

LLVM/project 3be515cllvm/test/Instrumentation/HWAddressSanitizer use-after-scope.ll

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+328-0llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+328-01 files

LLVM/project 967d960llvm/test/CodeGen/AArch64 stack-tagging.ll, llvm/test/Instrumentation/HWAddressSanitizer use-after-scope.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+328-0llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+29-0llvm/test/CodeGen/AArch64/stack-tagging.ll
+357-02 files

LLVM/project acdb508llvm/test/Instrumentation/HWAddressSanitizer use-after-scope.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+328-0llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+328-01 files

LLVM/project 8ce2b9cclang/include/clang/AST DeclCXX.h, clang/lib/AST ItaniumMangle.cpp ASTImporter.cpp

[Clang][ItaniumMangle] Fix recursive mangling for lambda init-captures (#182667)

[Clang][ItaniumMangle] Fix recursive mangling for lambda init-captures

Mangle computation for lambda signatures can recurse when a call
operator type
references an init-capture (for example via decltype(init-capture)). In
these
cases, mangling can re-enter the init-capture declaration and cycle back
through
operator() mangling.

Make lambda context publication explicit and independent from numbering
state,
then use that context uniformly during mangling:
* Publish lambda `ContextDecl` in `Sema::handleLambdaNumbering()` before
numbering, so dependent type mangling can resolve the lambda context
without
  recursing through the call operator.

    [19 lines not shown]
DeltaFile
+66-0clang/test/CodeGenCXX/mangle-lambdas.cpp
+32-6clang/lib/AST/ItaniumMangle.cpp
+16-8clang/lib/Sema/SemaLambda.cpp
+7-6clang/include/clang/AST/DeclCXX.h
+6-4clang/lib/AST/ASTImporter.cpp
+5-1clang/lib/AST/DeclCXX.cpp
+132-256 files

LLVM/project ee6f5f3llvm/lib/Transforms/InstCombine InstCombineLoadStoreAlloca.cpp, llvm/test/Transforms/InstCombine alloca-poison-size.ll

[InstCombine] Replace alloca with undef size with poison instead of null (#182919)

InstCombine previously replaced an alloca instruction with a null
pointer when the array size operand was undef. While this replacement
may be legal, it still caused invalid IR in cases where the original
alloca was used by `@llvm.lifetime` intrinsics.

The spec requires that the pointer operand of `@llvm.lifetime.*` must be
either:

- a pointer to an alloca instruction, or
- a poison value.

Replacing the pointer with null violated this requirement and triggered
verifier errors.

These new changes update InstCombine so that in this scenario the alloca
is replaced with poison instead of null.
DeltaFile
+29-0llvm/test/Transforms/InstCombine/alloca-poison-size.ll
+1-1llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+30-12 files

LLVM/project 2c99c01llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlan.h, llvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll

[VPlan] Implement VPlan-based stride speculation
DeltaFile
+928-1,076llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+273-150llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+235-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+43-0llvm/lib/Transforms/Vectorize/VPlan.h
+5-5llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+7-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+1,491-1,2345 files not shown
+1,513-1,23711 files

LLVM/project afcfa8aclang/lib/CodeGen CGObjCMac.cpp CodeGenModule.h, clang/test/CodeGenObjC expose-direct-method-varargs.m expose-direct-method-consumed.m

address reviewer's comments
DeltaFile
+20-27clang/lib/CodeGen/CGObjCMac.cpp
+20-16clang/lib/CodeGen/CodeGenModule.h
+4-3clang/test/CodeGenObjC/expose-direct-method-varargs.m
+1-2clang/test/CodeGenObjC/expose-direct-method-consumed.m
+1-2clang/test/CodeGenObjC/expose-direct-method-linkedlist.m
+1-1clang/lib/CodeGen/CGObjC.cpp
+47-516 files

LLVM/project 4d7247dllvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize runtime-check-needed-but-empty.ll pr37248.ll

[VPlan] Scalarize to first-lane-only directly on VPlan
DeltaFile
+68-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+5-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+2-2llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
+3-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll
+1-1llvm/test/Transforms/LoopVectorize/pr37248.ll
+80-41 files not shown
+81-47 files

LLVM/project 25d709ellvm/lib/Target/SystemZ SystemZAsmPrinter.cpp, llvm/test/CodeGen/SystemZ zos-ada-relocations.ll

[SystemZ] Emit external aliases for indirect function descriptors in the ADA section (#183443)

This is the last of the three patches aimed to support indirect symbol
handling for the SystemZ backend.

An external alias is emitted for indirect function descriptors within
the ADA section, rather than a temporary alias, while also setting all
of the appropriate symbol attributes that are needed for the HLASM
streamer to emit the correct XATTR and ALIAS instructions for the
indirect symbols.

Moreover, this patch updates the
`CodeGen/SystemZ/zos-ada-relocations.ll` test as the ADA section is
currently the only user of indirect symbols on z/OS.

Depends on https://github.com/llvm/llvm-project/pull/183442.
DeltaFile
+7-4llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+5-1llvm/test/CodeGen/SystemZ/zos-ada-relocations.ll
+12-52 files

LLVM/project cf28f23llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AMDGPU zext-duplicate-shift.ll

[SLP] Reject duplicate shift amounts in matchesShlZExt reorder path (#183627)

In the reordered RHS path of matchesShlZExt, the code never checked that
each shift amount (0, Stride, 2×Stride, …) appears at most once. When
the same shift appeared in multiple lanes, it still filled Order,
producing a non-permutation (e.g. Order = [0,0,0,1]). That led to bad
shuffle masks and miscompilation (e.g. shuffles with poison).

The patch adds an explicit duplicate check: before setting Order[Idx] =
Pos, it ensures Pos has not been seen before, using a SmallBitVector
SeenPositions(VF). If a position is seen twice, the function returns
false and the optimization is not applied.
DeltaFile
+14-8llvm/test/Transforms/SLPVectorizer/AMDGPU/zext-duplicate-shift.ll
+5-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+19-82 files

LLVM/project 282a2b7clang/lib/Analysis/Scalable/Serialization JSONFormat.cpp, clang/lib/Analysis/Scalable/Serialization/JSONFormat JSONFormatImpl.cpp TUSummary.cpp

[clang][ssaf] Add `JSONFormat` support for `TUSummaryEncoding`

This PR adds `JSONFormat` support for reading and writing
`TUSummaryEncoding`. The implementation exploits similarities in the
structures of `TUSummary` and `TUSummaryEncoding` by reusing existing
`JSONFormat` support for `TUSummary`. Duplication of tests has been
avoided by parameterizing the test fixture that runs all relevant
read/write tests against `TUSummary`, for `TUSummaryEncoding`. This
ensures that the two serialization paths remain in lockstep.
DeltaFile
+0-1,182clang/lib/Analysis/Scalable/Serialization/JSONFormat.cpp
+638-502clang/unittests/Analysis/Scalable/Serialization/JSONFormatTest/TUSummaryTest.cpp
+596-0clang/lib/Analysis/Scalable/Serialization/JSONFormat/JSONFormatImpl.cpp
+504-0clang/lib/Analysis/Scalable/Serialization/JSONFormat/TUSummary.cpp
+407-0clang/lib/Analysis/Scalable/Serialization/JSONFormat/TUSummaryEncoding.cpp
+148-0clang/lib/Analysis/Scalable/Serialization/JSONFormat/JSONFormatImpl.h
+2,293-1,6845 files not shown
+2,358-1,68611 files

LLVM/project 403fd76llvm/lib/CodeGen SlotIndexes.cpp, llvm/test/CodeGen/AArch64 misched-detail-resource-booking-01.mir

[SlotIndexes] Further pack indices to improve spill placement time (#182640)

This patch makes it so that renumbering indices when inserting
instructions into the SlotIndexes analysis renumbers the entire list if
the list is otherwise densely packed. This fixes a case we saw on
AArch64 with a lot of spills where every single spill instruction
insertion required a renumbering of most of the instructions in a large
function, making the operation approximately quadratic.

This is not NFC as heuristics depend on the SlotIndex numbers, although
this should mostly be a wash as LRs should be extended ~equally.
DeltaFile
+1,389-1,365llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+38-38llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir
+6-6llvm/test/CodeGen/AMDGPU/remat-sop.mir
+9-0llvm/lib/CodeGen/SlotIndexes.cpp
+2-2llvm/test/CodeGen/X86/statepoint-live-in.ll
+2-2llvm/test/CodeGen/X86/statepoint-regs.ll
+1,446-1,4136 files

LLVM/project dce48f2clang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-openmp-sanitize-options.c

[OpenMP] Enable internalization of 'ockl.bc' for OpenMP (#183685)

Fix linking of 'ockl.bc' for OpenMP by switching from
`-mlink-bitcode-file` to `-mlink-builtin-bitcode`
DeltaFile
+1-1clang/lib/Driver/ToolChains/AMDGPU.cpp
+1-1clang/test/Driver/amdgpu-openmp-sanitize-options.c
+2-22 files

LLVM/project c05e323llvm/lib/Target/WebAssembly WebAssemblyFixIrreducibleControlFlow.cpp

[WebAssembly] Incorporate SCCs into WebAssemblyFixIrreducibleControlFlow (#181755)

Rather than mapping out full "reachability" between blocks in a region
to find loops and using `LoopBlocks` to find the bodies of said loops,
use SCCs (strongly-connected components) to provide this information.

This brings in LLVM's generic `SCCIterator` (which uses Tarjan's
algorithm) as the implementation for sorting the basic blocks of the CFG
into their SCCs.

This PR greatly reduces the compile-time footprint of the pass, making
memory use and time taken negliable where it might have previously
caused stalls and OOM before (e.g. #47793,
usagi-coffee/tree-sitter-abl#114)

------

Supersedes #179722


    [10 lines not shown]
DeltaFile
+151-136llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
+151-1361 files

LLVM/project dcdbaffllvm/test/CodeGen/AMDGPU llvm.exp10.f64.ll llvm.exp.f64.ll, llvm/test/CodeGen/RISCV clmul.ll clmulr.ll

Rebase

Created using spr 1.3.6-beta.1
DeltaFile
+25,051-14,920llvm/test/CodeGen/RISCV/clmul.ll
+16,004-0llvm/test/MC/AMDGPU/gfx13_asm_vopd3.s
+13,198-0llvm/test/CodeGen/RISCV/clmulr.ll
+12,863-0llvm/test/CodeGen/RISCV/clmulh.ll
+11,178-0llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll
+10,242-0llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll
+88,536-14,9203,624 files not shown
+297,206-108,3723,630 files

LLVM/project a0bde8dllvm/test/CodeGen/AMDGPU llvm.exp10.f64.ll llvm.exp.f64.ll, llvm/test/CodeGen/RISCV clmul.ll clmulr.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+25,051-14,920llvm/test/CodeGen/RISCV/clmul.ll
+16,004-0llvm/test/MC/AMDGPU/gfx13_asm_vopd3.s
+13,198-0llvm/test/CodeGen/RISCV/clmulr.ll
+12,863-0llvm/test/CodeGen/RISCV/clmulh.ll
+11,178-0llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll
+10,242-0llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll
+88,536-14,9203,624 files not shown
+297,206-108,3723,630 files

LLVM/project 852c6efmlir/include/mlir/Conversion/LLVMCommon Pattern.h, mlir/lib/Conversion/AMDGPUToROCDL AMDGPUToROCDL.cpp

[mlir][LLVM] Let decomposeValue/composeValue handle aggregates (#183405)

This commit updates the LLVM::decomposeValue and LLVM::composeValue
methods to handle aggregate types - LLVM arrays and structs, and to have
different behaviors on dealing with types like pointers that can't be
bitcast to fixed-size integers. This allows the "any type" on
gpu.subgroup_broadcast to be more comprehensive - you can broadcast a
memref to a subgroup by decomposing it, for example.

(This branched off of getting an LLM to implement
ValueuboundsOpInterface on subgroup_broadcast, having it add handling
for the dimensions of shaped types, and realizing that there's no
fundamental reason you can't broadcast a memref or the like)

---------

Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+144-42mlir/lib/Conversion/LLVMCommon/Pattern.cpp
+69-0mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
+10-4mlir/include/mlir/Conversion/LLVMCommon/Pattern.h
+10-4mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+8-4mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+241-545 files

LLVM/project 4bcec5allvm/test/CodeGen/AMDGPU llvm.exp10.f64.ll llvm.exp.f64.ll, llvm/test/CodeGen/RISCV clmul.ll clmulr.ll

Rebase, address review comment

Created using spr 1.3.6-beta.1
DeltaFile
+25,051-14,920llvm/test/CodeGen/RISCV/clmul.ll
+16,004-0llvm/test/MC/AMDGPU/gfx13_asm_vopd3.s
+13,198-0llvm/test/CodeGen/RISCV/clmulr.ll
+12,863-0llvm/test/CodeGen/RISCV/clmulh.ll
+11,178-0llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll
+10,242-0llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll
+88,536-14,9203,624 files not shown
+297,206-108,3723,630 files