LLVM/project 2f1e0d1llvm/test/Transforms/LoopVectorize epilog-vectorization-reductions.ll, llvm/test/Transforms/LoopVectorize/X86 transform-narrow-interleave-to-widen-memory-epilogue-vec.ll

[LV] Add additional epilogue vector tests.

Add additional epilogue vectorization tests for
 * https://github.com/llvm/llvm-project/issues/187323
 * https://github.com/llvm/llvm-project/issues/185345
DeltaFile
+303-1llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
+123-0llvm/test/Transforms/LoopVectorize/X86/transform-narrow-interleave-to-widen-memory-epilogue-vec.ll
+426-12 files

LLVM/project 22977fdllvm/test/CodeGen/AMDGPU llvm.amdgcn.ds.bpermute.ll llvm.amdgcn.ds.permute.ll

[AMDGPU][NFC] Update permute tests to use auto-generated checks (#188107)

Also add global-isel run line.
DeltaFile
+125-13llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll
+50-5llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
+175-182 files

LLVM/project 3124cb3llvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis SameSDLoops.ll

[DA] Bug fix regarding the SameSD levels (#188098)

SCEV isKnownPredicate may crash if the expressions are involved with
different loops. To verify if two loops have the same iteration space,
we do not need to use the SCEV apis, and it can be done by the equality
check.

Moreover, no pass (not even loop fusion) requires to check SameSD levels
for more than one level. In this patch, we limit the analysis of SameSD
levels to only one level after the common levels.
DeltaFile
+23-20llvm/lib/Analysis/DependenceAnalysis.cpp
+1-1llvm/test/Analysis/DependenceAnalysis/SameSDLoops.ll
+24-212 files

LLVM/project 10580belibclc/clc/lib/generic/math clc_tgamma.inc clc_tgamma.cl

libclc: Improve tgamma handling
DeltaFile
+213-0libclc/clc/lib/generic/math/clc_tgamma.inc
+12-54libclc/clc/lib/generic/math/clc_tgamma.cl
+225-542 files

LLVM/project 13d9304libclc/clc/lib/generic/math clc_lgamma_r_stret.inc clc_lgamma_r.cl

avoid fract
DeltaFile
+2-4libclc/clc/lib/generic/math/clc_lgamma_r_stret.inc
+1-1libclc/clc/lib/generic/math/clc_lgamma_r.cl
+3-52 files

LLVM/project 9096c9cllvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion da_separate_loops.ll

[LoopFusion] Remove the InvalidDependencies duplicates (#187744)

If the function dependencesAllowFusion returns false, in fuseCandidates
the reportLoopFusion function is used to increment InvalidDependencies
and to emit a OptimizationRemarkMissed. If both dependencesAllowFusion
and reportLoopFusion increment InvalidDependencies, statistics will
appear duplicated
DeltaFile
+0-5llvm/lib/Transforms/Scalar/LoopFuse.cpp
+1-1llvm/test/Transforms/LoopFusion/da_separate_loops.ll
+1-62 files

LLVM/project 205187cflang/lib/Optimizer/OpenACC/Transforms ACCDeclareActionConversion.cpp, flang/test/Fir/OpenACC declare-action-conversion.fir

[openacc][flang] full support to handle allocatable/pointer runtime declare-action calls in ACCDeclareActionConversion (#188055)

Supported before: `fir.store`, `fir.box_addr`, and `fir.call` only for
PointerAllocate/PointerDeallocate.
Added now: fir.call support for PointerAllocateSource,
PointerDeallocatePolymorphic, AllocatableAllocate,
AllocatableAllocateSource, AllocatableDeallocate,
AllocatableDeallocatePolymorphic (found in
flang/include/flang/Runtime/allocatable.h).
DeltaFile
+74-0flang/test/Fir/OpenACC/declare-action-conversion.fir
+42-12flang/lib/Optimizer/OpenACC/Transforms/ACCDeclareActionConversion.cpp
+116-122 files

LLVM/project c9b473allvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FMA/FMAD pattern

Add conservative FMA/FMAD recognition to allMulUsesCanBeContracted:
a multiply used by an existing FMA/FMAD is assumed to be contractable
(it's already being contracted elsewhere). This avoids unnecessary
contraction blocking for multiplies that feed into FMA chains.

Also adds FMA/FMAD to the FPEXT user set (fpext(fmul) --> fma is
recognized as contractable when isFPExtFoldable).

Guards all remaining FMA-chain reassociation fold sites in both
SDAG (visitFADDForFMACombine/visitFSUBForFMACombine, 8 sites) and
GISel (matchCombineFAddFpExtFMulToFMadOrFMAAggressive, 4 sites).

This re-enables contractions that were conservatively blocked in
earlier patches where the multiply had an FMA use that wasn't yet
recognized: dagcombine-fma-crash.ll and dagcombine-fma-fmad.ll
CHECK lines revert to upstream behavior.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+95-96llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
+20-3llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+10-12llvm/test/CodeGen/AMDGPU/dagcombine-fma-crash.ll
+17-2llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+142-1134 files

LLVM/project f7d5e59lldb/packages/Python/lldbsuite/test dotest_args.py decorators.py, lldb/test CMakeLists.txt

[lldb] Check for arm64e debugserver in skipUnlessArm64eSupported (#188082)

Explicitly check whether we are building debugserver for arm64e. To
debug an arm64e binary, debugserver itself needs to be an arm64e
process.

This PR eliminates the possibility of configuring LLDB with Right now,
it's possible to configure CMake with
`LLDB_ENABLE_ARM64E_DEBUGSERVER=Off` and the decorator wouldn't account
for that.
DeltaFile
+6-1lldb/test/CMakeLists.txt
+6-0lldb/packages/Python/lldbsuite/test/dotest_args.py
+6-0lldb/utils/lldb-dotest/CMakeLists.txt
+4-0lldb/packages/Python/lldbsuite/test/decorators.py
+3-0lldb/test/API/lit.cfg.py
+3-0lldb/packages/Python/lldbsuite/test/configuration.py
+28-13 files not shown
+35-19 files

LLVM/project 664f788llvm/include/llvm/CodeGen/GlobalISel CombinerHelper.h, llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp

[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern

Extend the allMulUsesCanBeContracted analysis to recognize FPEXT patterns
where the multiply result flows through fpext before being used in
contractable operations (fadd, fsub). This covers:
  - fmul --> fpext --> {fadd, fsub}: FPEXT folds if isFPExtFoldable
  - fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
  - fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable

Also adds allMulUsesCanBeContracted guards to all FPEXT fold sites in
both SDAG (visitFADDForFMACombine, visitFSUBForFMACombine) and GISel
(matchCombineFAddFpExtFMulToFMadOrFMA, matchCombineFSubFpExtFMulToFMadOrFMA,
matchCombineFSubFpExtFNegFMulToFMadOrFMA).

Fixes a missing isFPExtFoldable check in GISel's
matchCombineFSubFpExtFMulToFMadOrFMA which could fold without verifying
the extension is actually foldable.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+1,930-11llvm/test/CodeGen/AMDGPU/fma-multiple-uses-contraction.ll
+93-14llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+78-13llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+2-1llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+2,103-394 files

LLVM/project 9acfa56llvm/lib/Transforms/Coroutines CoroSplit.cpp, llvm/test/Transforms/Coroutines coro-split-addrspace.ll

[Coro] Preserve program address spaces correctly in CoroSplit. (#188002)
DeltaFile
+96-0llvm/test/Transforms/Coroutines/coro-split-addrspace.ll
+3-3llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+99-32 files

LLVM/project 4074305clang/lib/Sema HLSLBuiltinTypeDeclBuilder.cpp SemaHLSL.cpp, clang/test/CodeGenHLSL/resources Texture2D-Mips.hlsl

[HLSL] Implement Texture2D::mips[][]

We implement the Textur2D::mips[][] method. We follow the design in DXC.
There is a new member called `mips` with type mips_type. The member will
contain a copy of the handle for the texture.

The type `mips_type` will have a member function `operator[]` that takes
a level, and returns a `mips_slice_type`. The slice will contain the
handle and the level. It also has an operator[] member function that
take a coordinate. It will do a load from the handle with the level and
coordinate, and return that value.

Assisted-by: Gemini
DeltaFile
+284-35clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.cpp
+235-0clang/test/SemaHLSL/Texture2D-mips-errors.ll
+65-0clang/test/CodeGenHLSL/resources/Texture2D-Mips.hlsl
+43-0clang/lib/Sema/SemaHLSL.cpp
+19-0clang/test/SemaHLSL/Texture2D-mips-errors.hlsl
+12-6clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.h
+658-4112 files not shown
+679-5818 files

LLVM/project 4e555f3clang/include/clang/Analysis/Analyses/LifetimeSafety Loans.h, clang/lib/Analysis/LifetimeSafety Checker.cpp FactsGenerator.cpp

Expire AccessPaths instead of loans
DeltaFile
+72-111clang/include/clang/Analysis/Analyses/LifetimeSafety/Loans.h
+41-60clang/lib/Analysis/LifetimeSafety/Checker.cpp
+18-37clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+32-9clang/lib/Analysis/LifetimeSafety/Loans.cpp
+10-10clang/test/Sema/warn-lifetime-safety-dataflow.cpp
+9-9clang/unittests/Analysis/LifetimeSafetyTest.cpp
+182-2364 files not shown
+202-26010 files

LLVM/project fdb9cb3clang/lib/Headers ptrauth.h, compiler-rt/lib/builtins crtbegin.c

[PAC][compiler-rt] Use `__ptrauth` qualifier instead of builtins

Since #100830 has landed, we no longer need to rely on builtins
DeltaFile
+6-22compiler-rt/lib/builtins/crtbegin.c
+8-0clang/lib/Headers/ptrauth.h
+14-222 files

LLVM/project e38e87dllvm/lib/Target/LoongArch LoongArchISelLowering.cpp

[LoongArch] Fix -Wunused-variable in c4b01ec20d3e845d348b0b005102b1301a8550ca

This variable was only used in assertions which was causing warnings in
release+noasserts builds.
DeltaFile
+1-2llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+1-21 files

LLVM/project 0c1013bllvm/include/llvm/IR Instructions.h, llvm/lib/IR Instructions.cpp

[IR][NFC] Rename UncondBrInst 'IfTrue' argument to 'Target' (#187631)

Follow-up on @slackito's suggestion about the naming of the variable and
discussion with @aengelke in #187196
DeltaFile
+3-4llvm/lib/IR/Instructions.cpp
+3-3llvm/include/llvm/IR/Instructions.h
+6-72 files

LLVM/project b75222cclang/lib/Driver CMakeLists.txt

[clang][Driver][CMake] Link pthread when available to fix shared-lib link errors.
DeltaFile
+4-0clang/lib/Driver/CMakeLists.txt
+4-01 files

LLVM/project bd662b8llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FNEG pattern

Extend allMulUsesCanBeContracted() to recognize fmul -> fneg -> fsub
chains as contractable uses. This allows FMA contraction when a multiply
feeds an fneg that is only used by fsub operations.

Changes:
- DAGCombiner.cpp: Add ISD::FNEG case to allMulUsesCanBeContracted()
  checking that all FNEG users are ISD::FSUB. Update 1 fold site guard
  in visitFSUBForFMACombine (fsub(fneg(fmul))).
- CombinerHelper.cpp: Add G_FNEG case to allMulUsesCanBeContracted()
  checking that all FNEG users are G_FSUB. Update 2 fold site guards
  in matchCombineFSubFNegFMulToFMadOrFMA. Fix guard ordering to check
  isContractableFMul before allMulUsesCanBeContracted (cheap first).
- Add 7 new test functions to fma-multiple-uses-contraction.ll covering
  fneg single-use, multi-use, mixed contractable/non-contractable, and
  cross-pattern (P1 direct + P2 fneg) interactions.
- Update mad-combine.ll CHECK lines affected by the guard changes.


    [4 lines not shown]
DeltaFile
+666-0llvm/test/CodeGen/AMDGPU/fma-multiple-uses-contraction.ll
+33-7llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+22-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+4-5llvm/test/CodeGen/AMDGPU/mad-combine.ll
+725-144 files

LLVM/project 2b29e6fllvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[DAGCombiner][GlobalISel] Prevent FMA contraction when fmul cannot be eliminated (FADD/FSUB pattern)

fmul nodes with multiple uses can currently be contracted into FMA
operations even when the fmul itself cannot be eliminated, resulting in
a redundant multiply (wasted power and compute). The existing guard
`Aggressive || N0->hasOneUse()` allows contraction under Aggressive mode
regardless of whether the multiply can be removed.

This patch tightens the guard to:
  `N0->hasOneUse() || (Aggressive && allMulUsesCanBeContracted(N0))`

`allMulUsesCanBeContracted()` iterates all users of the multiply and
returns true only if every use is itself contractable into an FMA.
For this first patch, only direct FADD and FSUB uses are recognized as
contractable (FNEG, FPEXT, and FMA/FMAD patterns follow in subsequent
patches).

The change is applied symmetrically to both DAGCombiner and GlobalISel:
- DAGCombiner: 4 fold sites in visitFADDForFMACombine (2 sites) and

    [8 lines not shown]
DeltaFile
+835-0llvm/test/CodeGen/AMDGPU/fma-multiple-uses-contraction.ll
+115-148llvm/test/CodeGen/AMDGPU/fma.f16.ll
+118-106llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
+61-26llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+39-42llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
+43-5llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+1,211-3275 files not shown
+1,239-34811 files

LLVM/project 0ae0ee0flang/docs Extensions.md

[flang] Adding a new extension that was noticed to be intentionally in the code. (#182891)

The flang compiler intentionally issues a warning on duplicate
prefix-specs for procedures. This is not consistent with the standard
which says "shall contain at most one of each". Other tested compilers
correctly issue an error.

It is safe to leave this as a warning. It can be turned into an error
condition by using the "-Werror" flag.

However, it should be noted in the Extensions document, similar to the
mention of the SAVE attribute.
DeltaFile
+3-0flang/docs/Extensions.md
+3-01 files

LLVM/project 4974e0dllvm/lib/Analysis LoopAccessAnalysis.cpp, llvm/test/Analysis/LoopAccessAnalysis multiple_stores_to_same_addr.ll

[LAA] Detect cross-iteration WAW when writing to the same pointer

Fixes https://github.com/llvm/llvm-project/issues/187402.
DeltaFile
+46-7llvm/test/Analysis/LoopAccessAnalysis/multiple_stores_to_same_addr.ll
+39-12llvm/lib/Analysis/LoopAccessAnalysis.cpp
+12-28llvm/test/Transforms/LoopVectorize/RISCV/gather-scatter-cost.ll
+97-473 files

LLVM/project aa4e85allvm/test/Analysis/LoopAccessAnalysis multiple_stores_to_same_addr.ll

Add test
DeltaFile
+374-0llvm/test/Analysis/LoopAccessAnalysis/multiple_stores_to_same_addr.ll
+374-01 files

LLVM/project 642db77llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV complex-loads.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+81-150llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
+6-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+87-1512 files

LLVM/project 729480cclang/lib/CIR/CodeGen CIRGenExprCXX.cpp CIRGenFunction.cpp, clang/test/CIR/CodeGen new-array-size-conv.cpp cleanup-scope-return-in-loop.cpp

[CIR] Generalize cxx alloc new size handling (#187790)

The non-constant size handling in `emitCXXNewAllocSize` was making the
incorrect assumption that the default behavior of the size value being
explicitly cast to size_t would be the only behavior we'd see. This is
actually only true with C++14 and later. To properly handle earlier
standards, we need the more robust checking that classic codegen does in
the equivalent function. This change adds that handling.

Assisted-by: Cursor / claude-4.6-opus-high
DeltaFile
+76-18clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp
+70-0clang/test/CIR/CodeGen/new-array-size-conv.cpp
+58-0clang/test/CIR/CodeGen/cleanup-scope-return-in-loop.cpp
+12-2clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+6-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+222-205 files

LLVM/project 1ab49a9clang/lib/Format TokenAnnotator.cpp, clang/unittests/Format TokenAnnotatorTest.cpp

[clang-format] Fix regression in annotating angles in static_assert (#187966)

Fixes #187936

(cherry picked from commit 4b084f23bac39343d93ec91369efb2027d9fa153)
DeltaFile
+14-1clang/unittests/Format/TokenAnnotatorTest.cpp
+6-1clang/lib/Format/TokenAnnotator.cpp
+20-22 files

LLVM/project ed95511llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/PhaseOrdering/X86 scalarization.ll scalarization-inseltpoison.ll

[SLP]Improve reductions for copyables/split nodes

The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.

Fixes #184313

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/185697
DeltaFile
+58-28llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+30-25llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+16-20llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll
+8-9llvm/test/Transforms/PhaseOrdering/X86/scalarization.ll
+8-9llvm/test/Transforms/PhaseOrdering/X86/scalarization-inseltpoison.ll
+4-5llvm/test/Transforms/SLPVectorizer/X86/revec-reduced-value-vectorized-later.ll
+124-961 files not shown
+126-987 files

LLVM/project 89503bdllvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies (#169617)

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the
number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+41-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+38-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+3-4llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+96-71 files not shown
+98-77 files

LLVM/project 412aaebclang/include/clang/CIR/Dialect/IR CIROps.td, clang/test/CIR/Transforms bit.cir

[CIR] Add Involution trait to BitReverseOp and ByteSwapOp (#187862)

bitreverse(bitreverse(x)) == x and byte_swap(byte_swap(x)) == x are
mathematical involutions.

This adds MLIR Involution trait to CIR opetation, it encodes this
property and automatically folds away the outer application when an op's
input is produced by the same op type.
DeltaFile
+20-0clang/test/CIR/Transforms/bit.cir
+4-0clang/include/clang/CIR/Dialect/IR/CIROps.td
+24-02 files

LLVM/project f545b56llvm/lib/Transforms/Scalar JumpThreading.cpp, llvm/test/Transforms/JumpThreading update-bpi-bfi-unfold-select.ll

[JT] `tryToUnfoldSelectInCurrBB` should update BFI & BPI if present
DeltaFile
+58-0llvm/test/Transforms/JumpThreading/update-bpi-bfi-unfold-select.ll
+32-0llvm/lib/Transforms/Scalar/JumpThreading.cpp
+90-02 files

LLVM/project c9b9079llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Don't use RPOT, per review feedback
DeltaFile
+3-11llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+3-111 files