LLVM/project b74fc87llvm/lib/Target/SPIRV SPIRVModuleAnalysis.cpp, llvm/test/CodeGen/SPIRV/extensions SPV_KHR_bit_instructions_remove_cap_if.ll SPV_KHR_bit_instructions_no_extension.ll

[SPIR-V] Fix removeCapabilityIf not pruning MinimalCaps (#206478)

removeCapabilityIf erased from AllCaps but not MinimalCaps, which is
what the AsmPrinter iterates to emit OpCapability, so pruned
capabilities were still emitted
DeltaFile
+42-0llvm/unittests/Target/SPIRV/SPIRVModuleAnalysisTests.cpp
+20-0llvm/test/CodeGen/SPIRV/extensions/SPV_KHR_bit_instructions_remove_cap_if.ll
+3-1llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+1-1llvm/test/CodeGen/SPIRV/extensions/SPV_KHR_bit_instructions_no_extension.ll
+1-0llvm/unittests/Target/SPIRV/CMakeLists.txt
+67-25 files

LLVM/project 00f3c79clang/lib/Basic/Targets AArch64.cpp, clang/lib/Driver ToolChain.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+56-31llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp
+42-39llvm/lib/TargetParser/AArch64TargetParser.cpp
+28-27llvm/unittests/TargetParser/TargetParserTest.cpp
+28-21llvm/include/llvm/TargetParser/AArch64TargetParser.h
+8-8clang/lib/Driver/ToolChain.cpp
+3-3clang/lib/Basic/Targets/AArch64.cpp
+165-1292 files not shown
+168-1318 files

LLVM/project f328be6clang/lib/Driver ToolChain.cpp, clang/lib/Driver/ToolChains Clang.cpp

[spr] changes to main this commit is based on

Created using spr 1.3.8-wip

[skip ci]
DeltaFile
+35-32llvm/lib/TargetParser/AArch64TargetParser.cpp
+40-18llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp
+27-20llvm/include/llvm/TargetParser/AArch64TargetParser.h
+20-20llvm/unittests/TargetParser/TargetParserTest.cpp
+8-8clang/lib/Driver/ToolChain.cpp
+2-1clang/lib/Driver/ToolChains/Clang.cpp
+132-991 files not shown
+133-1007 files

LLVM/project a100e29clang/lib/Driver ToolChain.cpp, clang/lib/Driver/ToolChains Clang.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+35-32llvm/lib/TargetParser/AArch64TargetParser.cpp
+40-18llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp
+27-20llvm/include/llvm/TargetParser/AArch64TargetParser.h
+20-20llvm/unittests/TargetParser/TargetParserTest.cpp
+8-8clang/lib/Driver/ToolChain.cpp
+2-1clang/lib/Driver/ToolChains/Clang.cpp
+132-991 files not shown
+133-1007 files

LLVM/project 761eb4ellvm/include/llvm/ExecutionEngine/Orc UnwindInfoRegistrationPlugin.h, llvm/include/llvm/ExecutionEngine/Orc/Shared OrcRTBridge.h

[ORC] Support new ORC runtime in UnwindInfoRegistrationPlugin. (#206680)

Reworks UnwindInfoRegistrationPlugin::Create(ES) to look up the
register/deregister implementation addresses by symbol name, with
default names matching the SPS-CI alloc actions provided by
orc_rt::StandaloneMachOUnwindInfoRegistrar. In OrcTargetProcess,
UnwindInfoManager::addBootstrapSymbols now also vends its register and
deregister actions under those same names, so the new Create overloads
work against either backend.

Also removes a declared but unused (and undefined) Create overload.
DeltaFile
+15-12llvm/lib/ExecutionEngine/Orc/UnwindInfoRegistrationPlugin.cpp
+10-9llvm/include/llvm/ExecutionEngine/Orc/UnwindInfoRegistrationPlugin.h
+12-0llvm/include/llvm/ExecutionEngine/Orc/Shared/OrcRTBridge.h
+10-0llvm/lib/ExecutionEngine/Orc/TargetProcess/UnwindInfoManager.cpp
+5-0llvm/lib/ExecutionEngine/Orc/Shared/OrcRTBridge.cpp
+52-215 files

LLVM/project 2427da5llvm/lib/Target/AArch64 AArch64SchedC1Ultra.td, llvm/test/tools/llvm-mca/AArch64/Cortex C1Ultra-sve-instructions.s C1Ultra-neon-instructions.s

[AArch64] SME definitions for C1-Ultra scheduling model (#194850)

This patch extends the C1-Ultra scheduling model to add support for SME
instructions. These instructions differ from legacy scheduling model
instruction definitions in that they are sent to the CME co-processor
when in streaming mode. Modelling these instructions requires several
changes to the scheduling model

- implementations of instructions added by SME but don't require the
processor to be in streaming mode
- definitions of CME processor resources. Instructions sent to this
co-processor are modelled as having latency derived from the SME
software optimization guide (SWOG),
- predicating the process resource groups for instructions sent to the
CME coprocessor when in streaming mode,
- tests for all SME instructions in the software optimization guide


C1-Ultra SWOG: https://developer.arm.com/documentation/111079/3-0

    [2 lines not shown]
DeltaFile
+13,779-6,871llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-sve-instructions.s
+6,359-3,161llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-neon-instructions.s
+1,283-1,267llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-basic-instructions.s
+1,108-440llvm/lib/Target/AArch64/AArch64SchedC1Ultra.td
+373-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-sme-instructions.s
+359-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-streaming-sme-only-instructions.s
+23,261-11,73929 files not shown
+24,425-12,16435 files

LLVM/project 9066c66llvm/include/llvm/Object COFF.h, llvm/lib/Object COFFObjectFile.cpp

[Object][COFF] Introduce the .obj.arm64ec section (#205156)

Introduce a new extension section allowing the embedding of ARM64EC
object files inside native ARM64 object files. Its content consists of
an entire, valid ARM64EC COFF object file.
DeltaFile
+102-0llvm/test/tools/llvm-readobj/COFF/arm64x-hybridobj.yaml
+20-0llvm/lib/Object/COFFObjectFile.cpp
+8-3llvm/tools/llvm-readobj/llvm-readobj.cpp
+3-0llvm/include/llvm/Object/COFF.h
+133-34 files

LLVM/project 0b834f0llvm/test/CodeGen/X86 vector-reduce-add-sext.ll vector-reduce-add-mask.ll

[X86] Add AVX512F-only test coverage to vector-reduce-add tests (#206686)

Many of the lowerings use AVX512BW instructions - make sure we don't try to use them without it
DeltaFile
+320-74llvm/test/CodeGen/X86/vector-reduce-add-sext.ll
+318-32llvm/test/CodeGen/X86/vector-reduce-add-mask.ll
+213-50llvm/test/CodeGen/X86/vector-reduce-add.ll
+63-16llvm/test/CodeGen/X86/vector-reduce-add-zext.ll
+2-0llvm/test/CodeGen/X86/vector-reduce-add-subvector.ll
+2-0llvm/test/CodeGen/X86/vector-reduce-add-codesize.ll
+918-1726 files

LLVM/project e11d747llvm/lib/Target/SPIRV SPIRVLegalizerInfo.cpp, llvm/test/CodeGen/SPIRV/llvm-intrinsics fixed-point-math.ll fixed-point-math-i64.ll

[SPIR-V] Lower G_SMULFIX and G_UMULFIX (#206507)
DeltaFile
+94-0llvm/test/CodeGen/SPIRV/llvm-intrinsics/fixed-point-math.ll
+10-0llvm/test/CodeGen/SPIRV/llvm-intrinsics/fixed-point-math-i64.ll
+6-0llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+110-03 files

LLVM/project 712d2d9llvm/include/llvm/IR Instructions.h

Update for comments
DeltaFile
+1-5llvm/include/llvm/IR/Instructions.h
+1-51 files

LLVM/project 7fff69bllvm/include/llvm/TargetParser AArch64TargetParser.h, llvm/lib/TargetParser AArch64TargetParser.cpp

[AArch64][NFC] remove CPUInfo.getImpliedExtensions() (#206422)
DeltaFile
+3-3llvm/unittests/TargetParser/TargetParserTest.cpp
+0-4llvm/include/llvm/TargetParser/AArch64TargetParser.h
+1-2llvm/lib/TargetParser/AArch64TargetParser.cpp
+4-93 files

LLVM/project 85b6d76llvm/test/Assembler invalid-load-store-atomic-elementwise.ll

Add element non byte
DeltaFile
+8-0llvm/test/Assembler/invalid-load-store-atomic-elementwise.ll
+8-01 files

LLVM/project fa7a602llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeTypes.h, llvm/test/CodeGen/AArch64 intrinsic-cttz-elts-sve.ll

[CodeGen] Add widening support for ISD::CTTZ_ELTS (#205841)

WidenVectorOperand had no handler forCTTZ_ELTS/
CTTZ_ELTS_ZERO_POISON, causing a fatal error when the input vector type
needed widening.

Add WidenVecOp_CttzElements which widens the input vector and pads the
extra lanes with all-ones, ensuring they do not contribute spurious
trailing zeros to the count. This follows the same pattern as the
existing
WidenVecOp_VP_CttzElements.

Assisted-by: Claude (Anthropic)
DeltaFile
+60-0llvm/test/CodeGen/RISCV/rvv/cttz-elts.ll
+58-0llvm/test/CodeGen/Hexagon/cttz-elts-widen.ll
+36-0llvm/test/CodeGen/AArch64/intrinsic-cttz-elts-sve.ll
+18-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+1-0llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+173-05 files

LLVM/project b885cfbllvm/lib/Target/AArch64/MCTargetDesc AArch64MCLFIRewriter.cpp AArch64MCLFIRewriter.h, llvm/test/MC/AArch64/LFI guard-elim.s lse.s

[LFI][AArch64] Add guard elimination optimization (#204693)

This adds support for the guard elimination optimization to the AArch64
LFI rewriter. Redundant guards (`add x28, x27, wN, uxtw` instructions)
will be skipped when possible. See the LFI.rst documentation for an
example of the optimization.
DeltaFile
+176-0llvm/test/MC/AArch64/LFI/guard-elim.s
+37-2llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCLFIRewriter.cpp
+10-0llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCLFIRewriter.h
+2-2llvm/test/MC/AArch64/LFI/lse.s
+1-1llvm/test/MC/AArch64/LFI/mem.s
+1-1llvm/test/MC/AArch64/LFI/prefetch.s
+227-66 files not shown
+232-1312 files

LLVM/project a4e53b0llvm/lib/CodeGen BranchFolding.cpp, llvm/test/CodeGen/MIR/X86 branch-folder-drop-undef.mir

[BranchFolding] Drop undef flag when hoisting common code from successors (#205135)

Similarly to what already done during tail merging
(4040c0f4ec135c18e723c1807ec0d1dbbb4cf3fa), make sure the intersection
of undef flags is taken while hoisting common code from successors.

Fixes: https://github.com/llvm/llvm-project/issues/204549.
DeltaFile
+15-8llvm/lib/CodeGen/BranchFolding.cpp
+1-2llvm/test/CodeGen/MIR/X86/branch-folder-drop-undef.mir
+1-1llvm/test/CodeGen/X86/branch-folder-drop-undef-end-to-end.ll
+17-113 files

LLVM/project 38f6171llvm/lib/Transforms/Scalar GVN.cpp, llvm/test/Transforms/GVN/PRE pre-loop-load.ll

[GVN] Support critical-edge splitting in loop-load PRE

When the only in-loop blocker of a loop-load PRE candidate has multiple successors, the reload was placed at the end of that block, so it also ran on the loop-exit edge. Split the critical edge to the unique in-loop successor and insert the reload there, so it runs only on the path back to the header. Bail out on indirectbr or multiple in-loop successors, and keep backedge splitting gated behind the existing flag. Also refresh the stale TODO comments on the freeable-pointer tests, which stay un-PRE'd because the pointer may be freed.
DeltaFile
+35-2llvm/lib/Transforms/Scalar/GVN.cpp
+21-13llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
+56-152 files

LLVM/project aa529a6llvm/lib/Target/AMDGPU SIISelLowering.cpp

use decimal number rather than hex
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-11 files

LLVM/project fe2e44cllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fmax.ll llvm.amdgcn.reduce.fmin.ll

[AMDGPU] Support Wave Reduction intrinsics for half types

Supported Ops: `fmin`, `fmax`, `fadd`, `fsub`.
DeltaFile
+941-264llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmax.ll
+941-264llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmin.ll
+902-160llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fsub.ll
+899-160llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fadd.ll
+18-5llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+15-3llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+3,716-8566 files

LLVM/project 44e039ellvm/test/Transforms/GVN/PRE pre-loop-load.ll

[GVN] Add tests for loop-load PRE into a multi-successor block

Precommit tests for loop-load PRE when the loaded pointer cannot be freed (a gc-managed address-space pointer and a nofree function). PRE currently fires but sinks the reload into a cold block with multiple successors, so it also runs on the loop-exit edge.
DeltaFile
+114-0llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
+114-01 files

LLVM/project a40b88dllvm/utils/lit/lit/builtin_commands cat.py

[lit] Add stream-injectable run() core to builtin cat (#204711)

Pull cat's logic out into run(argv, stdin, stdout, stderr, cwd) so it
takes explicit streams instead of touching sys.std* directly. main()
just calls run() with the real process streams, so nothing changes for
the spawned-script path.

Needed before cat can run in-process inside the lit worker

Also switched file reads to raw bytes throughout, since the old
text-mode read + win32 msvcrt.setmode was only there for sys.stdout's
encoding, which doesn't apply once we pass in a binary stream directly.
Error messages still report the original filename, not the cwd-joined
path.

Signed-off-by: Prasoon Kumar <prasoonkumar054 at gmail.com>
DeltaFile
+40-34llvm/utils/lit/lit/builtin_commands/cat.py
+40-341 files

LLVM/project 1d3d2f4llvm/lib/Target/SPIRV SPIRVLegalizerInfo.cpp, llvm/test/CodeGen/SPIRV sat-shifts.ll

[SPIR-V] Lower G_SSHLSAT and G_USHLSAT (#206490)
DeltaFile
+57-0llvm/test/CodeGen/SPIRV/sat-shifts.ll
+2-0llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+59-02 files

LLVM/project 67e6697llvm/test/CodeGen/MIR/X86 branch-folder-drop-undef.mir, llvm/test/CodeGen/X86 branch-folder-drop-undef-end-to-end.ll

[BranchFolding] Introduce tests for PR205135 (NFC) (#206684)
DeltaFile
+103-0llvm/test/CodeGen/MIR/X86/branch-folder-drop-undef.mir
+52-0llvm/test/CodeGen/X86/branch-folder-drop-undef-end-to-end.ll
+155-02 files

LLVM/project 5e2b7dallvm/utils/lit/lit/builtin_commands diff.py

[lit] Use provided streams in builtin diff (#204869)

We want to move the diff builtin to run in-process inside the lit
worker, instead of spawning a subprocess. The current implementation
talks to sys.stdin / stdout / stderr directly, so it can't be called
with different streams.

To fix this, pull diff's logic into run(argv, stdin, stdout, stderr,
cwd), which takes streams as arguments instead of reaching for sys.std*.
main() now just calls run() with the real process streams, so the
spawned-script path is unchanged.

This also makes the 'import util' dual-mode: lit.util when diff is
imported as part of the lit package, falling back to flat util for the
spawned script.
DeltaFile
+91-42llvm/utils/lit/lit/builtin_commands/diff.py
+91-421 files

LLVM/project 05bb745llvm/lib/Target/AMDGPU SIInstructions.td

[AMDGPU] Fix regclass for a true16 pattern. NFCI. (#206513)

Add an EXTRACT_SUBREG to make it clear that the result of the pattern is
only the low 16 bits of the result of the V_BFI_B32. This does not seem
to affect codegen, presumably because we are lax about allowing COPY
between VGPR_16 and VGPR_32.
DeltaFile
+2-2llvm/lib/Target/AMDGPU/SIInstructions.td
+2-21 files

LLVM/project 201b694libc/include sched.yaml, libc/src/sched/linux CMakeLists.txt

[libc] Implement CPU_{AND,OR,XOR,EQUAL}(_S)? macros (#205412)

This patch implements CPU_AND, CPU_OR, CPU_XOR, and CPU_EQUAL macros
(along with their _S variants) from sched.h.

The implementation follows existing patterns by adding internal entry
points (__sched_andcpuset, __sched_orcpuset, __sched_xorcpuset, and
__sched_cpuequal) that perform bitwise operations on cpu_set_t. For
__sched_cpuequal, I use inline_memcmp instead of a manual loop.

Assisted by Gemini.
DeltaFile
+58-0libc/test/src/sched/CMakeLists.txt
+57-0libc/src/sched/linux/CMakeLists.txt
+38-0libc/test/src/sched/sched_xorcpuset_test.cpp
+37-0libc/test/src/sched/sched_orcpuset_test.cpp
+36-0libc/test/src/sched/sched_andcpuset_test.cpp
+35-0libc/include/sched.yaml
+261-014 files not shown
+603-620 files

LLVM/project 15d6951mlir/include/mlir/IR CommonTypeConstraints.td, mlir/test/Dialect/SparseTensor invalid.mlir

[mlir] Fix StridedMemRefRankOf to check isStrided()  (#201415)

StridedMemRefRankOf was equivalent to MemRefRankOf: it only applied
HasAnyRankOfPred and never HasStridesPred, so non-strided memref layouts
(e.g. multi-result affine maps) incorrectly passed ODS verification on
ops using this constraint (e.g. sparse_tensor.push_back).

The inBuffer of push_back uses StridedMemRefRankOf, which requires a
strided memref layout (HasStridesPred). A non-strided layout must be
rejected.
DeltaFile
+9-1mlir/test/Dialect/SparseTensor/invalid.mlir
+2-2mlir/include/mlir/IR/CommonTypeConstraints.td
+11-32 files

LLVM/project 719f52bclang/include/clang/CIR/Dialect Passes.td, clang/include/clang/Frontend FrontendOptions.h

[CIR] Intitial upstreaming of LibOpt pass (#172487)

This PR Upstreams a skeleton for the LibOpt pass, including the Clang frontend wiring.
DeltaFile
+77-0clang/lib/CIR/Dialect/Transforms/LibOpt.cpp
+15-1clang/lib/CIR/Lowering/CIRPasses.cpp
+12-0clang/include/clang/CIR/Dialect/Passes.td
+10-1clang/include/clang/Frontend/FrontendOptions.h
+10-0clang/include/clang/Options/Options.td
+6-1clang/lib/CIR/FrontendAction/CIRGenAction.cpp
+130-35 files not shown
+144-411 files

LLVM/project be82f85mlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp, mlir/test/Dialect/XeGPU propagate-layout.mlir

[MLIR][XeGPU] Slice the new dim in broadcast properly (#206136)
DeltaFile
+27-0mlir/test/Dialect/XeGPU/propagate-layout.mlir
+17-5mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+44-52 files

LLVM/project f931baellvm/lib/Transforms/Scalar LoopIdiomRecognize.cpp, llvm/test/Transforms/LoopIdiom memset-multiple-accesses.ll

[LoopIdiom] Form memset on runtime-trip multi-store loops. (#206354)

For runtime trip counts, mayLoopAccessLocation cannot bound the size of
the access, which prevents forming memsets for loops with multiple
stores of the same value.

If all may-aliasing stores write the same value, we can still form
potentially overlapping memsets, as the order of the memsets or writing
the same location multiple times should not matter.

On a large C/C++ based corpus (32k modules), we form ~2% more memsets.

```
                     base       patch
memsets formed      90,063     91,853   +1.99%
```

PR: https://github.com/llvm/llvm-project/pull/206354
DeltaFile
+28-15llvm/test/Transforms/LoopIdiom/memset-multiple-accesses.ll
+30-8llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+58-232 files

LLVM/project e7f7fbcmlir/include/mlir/Dialect/Bufferization/Transforms Passes.td, mlir/lib/Dialect/Bufferization/Transforms StaticMemoryPlannerAnalysis.cpp CMakeLists.txt

[mlir][bufferization] Add static memory planner pass for compile-time buffer allocation (#205125)

This PR introduces a new bufferization-related pass that performs static memory
planning at compile time. The pass is part of my GSoC 2026 project on
improving MLIR's buffer allocation strategies:
https://summerofcode.withgoogle.com/programs/2026/projects/XsjxBQ9o

### What this does

The static memory planner analyzes buffer lifetimes within a function
and consolidates multiple small `memref.alloc`/`memref.dealloc` pairs
into a single arena allocation. Instead of making separate heap
allocations for each memref, we compute offsets ahead of time and carve
out slices from one large buffer using `memref.view`.

This is useful for embedded systems and other memory-constrained
environments where you want predictable memory usage without runtime
allocation overhead.


    [40 lines not shown]
DeltaFile
+267-0mlir/lib/Dialect/Bufferization/Transforms/StaticMemoryPlannerAnalysis.cpp
+192-0mlir/test/Dialect/Bufferization/Transforms/static-memory-planner-analysis.mlir
+107-35mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
+45-0mlir/test/Dialect/Bufferization/Transforms/static-memory-planner-arena-arg.mlir
+1-0mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt
+612-355 files