LLVM/project 712d2d9llvm/include/llvm/IR Instructions.h

Update for comments
DeltaFile
+1-5llvm/include/llvm/IR/Instructions.h
+1-51 files

LLVM/project 7fff69bllvm/include/llvm/TargetParser AArch64TargetParser.h, llvm/lib/TargetParser AArch64TargetParser.cpp

[AArch64][NFC] remove CPUInfo.getImpliedExtensions() (#206422)
DeltaFile
+3-3llvm/unittests/TargetParser/TargetParserTest.cpp
+0-4llvm/include/llvm/TargetParser/AArch64TargetParser.h
+1-2llvm/lib/TargetParser/AArch64TargetParser.cpp
+4-93 files

LLVM/project 85b6d76llvm/test/Assembler invalid-load-store-atomic-elementwise.ll

Add element non byte
DeltaFile
+8-0llvm/test/Assembler/invalid-load-store-atomic-elementwise.ll
+8-01 files

LLVM/project fa7a602llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeTypes.h, llvm/test/CodeGen/AArch64 intrinsic-cttz-elts-sve.ll

[CodeGen] Add widening support for ISD::CTTZ_ELTS (#205841)

WidenVectorOperand had no handler forCTTZ_ELTS/
CTTZ_ELTS_ZERO_POISON, causing a fatal error when the input vector type
needed widening.

Add WidenVecOp_CttzElements which widens the input vector and pads the
extra lanes with all-ones, ensuring they do not contribute spurious
trailing zeros to the count. This follows the same pattern as the
existing
WidenVecOp_VP_CttzElements.

Assisted-by: Claude (Anthropic)
DeltaFile
+60-0llvm/test/CodeGen/RISCV/rvv/cttz-elts.ll
+58-0llvm/test/CodeGen/Hexagon/cttz-elts-widen.ll
+36-0llvm/test/CodeGen/AArch64/intrinsic-cttz-elts-sve.ll
+18-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+1-0llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+173-05 files

LLVM/project b885cfbllvm/lib/Target/AArch64/MCTargetDesc AArch64MCLFIRewriter.cpp AArch64MCLFIRewriter.h, llvm/test/MC/AArch64/LFI guard-elim.s lse.s

[LFI][AArch64] Add guard elimination optimization (#204693)

This adds support for the guard elimination optimization to the AArch64
LFI rewriter. Redundant guards (`add x28, x27, wN, uxtw` instructions)
will be skipped when possible. See the LFI.rst documentation for an
example of the optimization.
DeltaFile
+176-0llvm/test/MC/AArch64/LFI/guard-elim.s
+37-2llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCLFIRewriter.cpp
+10-0llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCLFIRewriter.h
+2-2llvm/test/MC/AArch64/LFI/lse.s
+1-1llvm/test/MC/AArch64/LFI/mem.s
+1-1llvm/test/MC/AArch64/LFI/prefetch.s
+227-66 files not shown
+232-1312 files

LLVM/project a4e53b0llvm/lib/CodeGen BranchFolding.cpp, llvm/test/CodeGen/MIR/X86 branch-folder-drop-undef.mir

[BranchFolding] Drop undef flag when hoisting common code from successors (#205135)

Similarly to what already done during tail merging
(4040c0f4ec135c18e723c1807ec0d1dbbb4cf3fa), make sure the intersection
of undef flags is taken while hoisting common code from successors.

Fixes: https://github.com/llvm/llvm-project/issues/204549.
DeltaFile
+15-8llvm/lib/CodeGen/BranchFolding.cpp
+1-2llvm/test/CodeGen/MIR/X86/branch-folder-drop-undef.mir
+1-1llvm/test/CodeGen/X86/branch-folder-drop-undef-end-to-end.ll
+17-113 files

LLVM/project aa529a6llvm/lib/Target/AMDGPU SIISelLowering.cpp

use decimal number rather than hex
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-11 files

LLVM/project fe2e44cllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fmax.ll llvm.amdgcn.reduce.fmin.ll

[AMDGPU] Support Wave Reduction intrinsics for half types

Supported Ops: `fmin`, `fmax`, `fadd`, `fsub`.
DeltaFile
+941-264llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmax.ll
+941-264llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmin.ll
+902-160llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fsub.ll
+899-160llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fadd.ll
+18-5llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+15-3llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+3,716-8566 files

LLVM/project a40b88dllvm/utils/lit/lit/builtin_commands cat.py

[lit] Add stream-injectable run() core to builtin cat (#204711)

Pull cat's logic out into run(argv, stdin, stdout, stderr, cwd) so it
takes explicit streams instead of touching sys.std* directly. main()
just calls run() with the real process streams, so nothing changes for
the spawned-script path.

Needed before cat can run in-process inside the lit worker

Also switched file reads to raw bytes throughout, since the old
text-mode read + win32 msvcrt.setmode was only there for sys.stdout's
encoding, which doesn't apply once we pass in a binary stream directly.
Error messages still report the original filename, not the cwd-joined
path.

Signed-off-by: Prasoon Kumar <prasoonkumar054 at gmail.com>
DeltaFile
+40-34llvm/utils/lit/lit/builtin_commands/cat.py
+40-341 files

LLVM/project 1d3d2f4llvm/lib/Target/SPIRV SPIRVLegalizerInfo.cpp, llvm/test/CodeGen/SPIRV sat-shifts.ll

[SPIR-V] Lower G_SSHLSAT and G_USHLSAT (#206490)
DeltaFile
+57-0llvm/test/CodeGen/SPIRV/sat-shifts.ll
+2-0llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+59-02 files

LLVM/project 67e6697llvm/test/CodeGen/MIR/X86 branch-folder-drop-undef.mir, llvm/test/CodeGen/X86 branch-folder-drop-undef-end-to-end.ll

[BranchFolding] Introduce tests for PR205135 (NFC) (#206684)
DeltaFile
+103-0llvm/test/CodeGen/MIR/X86/branch-folder-drop-undef.mir
+52-0llvm/test/CodeGen/X86/branch-folder-drop-undef-end-to-end.ll
+155-02 files

LLVM/project 5e2b7dallvm/utils/lit/lit/builtin_commands diff.py

[lit] Use provided streams in builtin diff (#204869)

We want to move the diff builtin to run in-process inside the lit
worker, instead of spawning a subprocess. The current implementation
talks to sys.stdin / stdout / stderr directly, so it can't be called
with different streams.

To fix this, pull diff's logic into run(argv, stdin, stdout, stderr,
cwd), which takes streams as arguments instead of reaching for sys.std*.
main() now just calls run() with the real process streams, so the
spawned-script path is unchanged.

This also makes the 'import util' dual-mode: lit.util when diff is
imported as part of the lit package, falling back to flat util for the
spawned script.
DeltaFile
+91-42llvm/utils/lit/lit/builtin_commands/diff.py
+91-421 files

LLVM/project 05bb745llvm/lib/Target/AMDGPU SIInstructions.td

[AMDGPU] Fix regclass for a true16 pattern. NFCI. (#206513)

Add an EXTRACT_SUBREG to make it clear that the result of the pattern is
only the low 16 bits of the result of the V_BFI_B32. This does not seem
to affect codegen, presumably because we are lax about allowing COPY
between VGPR_16 and VGPR_32.
DeltaFile
+2-2llvm/lib/Target/AMDGPU/SIInstructions.td
+2-21 files

LLVM/project 201b694libc/include sched.yaml, libc/src/sched/linux CMakeLists.txt

[libc] Implement CPU_{AND,OR,XOR,EQUAL}(_S)? macros (#205412)

This patch implements CPU_AND, CPU_OR, CPU_XOR, and CPU_EQUAL macros
(along with their _S variants) from sched.h.

The implementation follows existing patterns by adding internal entry
points (__sched_andcpuset, __sched_orcpuset, __sched_xorcpuset, and
__sched_cpuequal) that perform bitwise operations on cpu_set_t. For
__sched_cpuequal, I use inline_memcmp instead of a manual loop.

Assisted by Gemini.
DeltaFile
+58-0libc/test/src/sched/CMakeLists.txt
+57-0libc/src/sched/linux/CMakeLists.txt
+38-0libc/test/src/sched/sched_xorcpuset_test.cpp
+37-0libc/test/src/sched/sched_orcpuset_test.cpp
+36-0libc/test/src/sched/sched_andcpuset_test.cpp
+35-0libc/include/sched.yaml
+261-014 files not shown
+603-620 files

LLVM/project 15d6951mlir/include/mlir/IR CommonTypeConstraints.td, mlir/test/Dialect/SparseTensor invalid.mlir

[mlir] Fix StridedMemRefRankOf to check isStrided()  (#201415)

StridedMemRefRankOf was equivalent to MemRefRankOf: it only applied
HasAnyRankOfPred and never HasStridesPred, so non-strided memref layouts
(e.g. multi-result affine maps) incorrectly passed ODS verification on
ops using this constraint (e.g. sparse_tensor.push_back).

The inBuffer of push_back uses StridedMemRefRankOf, which requires a
strided memref layout (HasStridesPred). A non-strided layout must be
rejected.
DeltaFile
+9-1mlir/test/Dialect/SparseTensor/invalid.mlir
+2-2mlir/include/mlir/IR/CommonTypeConstraints.td
+11-32 files

LLVM/project 719f52bclang/include/clang/CIR/Dialect Passes.td, clang/include/clang/Frontend FrontendOptions.h

[CIR] Intitial upstreaming of LibOpt pass (#172487)

This PR Upstreams a skeleton for the LibOpt pass, including the Clang frontend wiring.
DeltaFile
+77-0clang/lib/CIR/Dialect/Transforms/LibOpt.cpp
+15-1clang/lib/CIR/Lowering/CIRPasses.cpp
+12-0clang/include/clang/CIR/Dialect/Passes.td
+10-1clang/include/clang/Frontend/FrontendOptions.h
+10-0clang/include/clang/Options/Options.td
+6-1clang/lib/CIR/FrontendAction/CIRGenAction.cpp
+130-35 files not shown
+144-411 files

LLVM/project be82f85mlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp, mlir/test/Dialect/XeGPU propagate-layout.mlir

[MLIR][XeGPU] Slice the new dim in broadcast properly (#206136)
DeltaFile
+27-0mlir/test/Dialect/XeGPU/propagate-layout.mlir
+17-5mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+44-52 files

LLVM/project f931baellvm/lib/Transforms/Scalar LoopIdiomRecognize.cpp, llvm/test/Transforms/LoopIdiom memset-multiple-accesses.ll

[LoopIdiom] Form memset on runtime-trip multi-store loops. (#206354)

For runtime trip counts, mayLoopAccessLocation cannot bound the size of
the access, which prevents forming memsets for loops with multiple
stores of the same value.

If all may-aliasing stores write the same value, we can still form
potentially overlapping memsets, as the order of the memsets or writing
the same location multiple times should not matter.

On a large C/C++ based corpus (32k modules), we form ~2% more memsets.

```
                     base       patch
memsets formed      90,063     91,853   +1.99%
```

PR: https://github.com/llvm/llvm-project/pull/206354
DeltaFile
+28-15llvm/test/Transforms/LoopIdiom/memset-multiple-accesses.ll
+30-8llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+58-232 files

LLVM/project e7f7fbcmlir/include/mlir/Dialect/Bufferization/Transforms Passes.td, mlir/lib/Dialect/Bufferization/Transforms StaticMemoryPlannerAnalysis.cpp CMakeLists.txt

[mlir][bufferization] Add static memory planner pass for compile-time buffer allocation (#205125)

This PR introduces a new bufferization-related pass that performs static memory
planning at compile time. The pass is part of my GSoC 2026 project on
improving MLIR's buffer allocation strategies:
https://summerofcode.withgoogle.com/programs/2026/projects/XsjxBQ9o

### What this does

The static memory planner analyzes buffer lifetimes within a function
and consolidates multiple small `memref.alloc`/`memref.dealloc` pairs
into a single arena allocation. Instead of making separate heap
allocations for each memref, we compute offsets ahead of time and carve
out slices from one large buffer using `memref.view`.

This is useful for embedded systems and other memory-constrained
environments where you want predictable memory usage without runtime
allocation overhead.


    [40 lines not shown]
DeltaFile
+267-0mlir/lib/Dialect/Bufferization/Transforms/StaticMemoryPlannerAnalysis.cpp
+192-0mlir/test/Dialect/Bufferization/Transforms/static-memory-planner-analysis.mlir
+107-35mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
+45-0mlir/test/Dialect/Bufferization/Transforms/static-memory-planner-arena-arg.mlir
+1-0mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt
+612-355 files

LLVM/project d8d626aclang/test/CodeGen attr-counted-by.c tbaa-pointers.c, clang/test/Headers __clang_hip_math.hip

[Clang] Emit struct TBAA for llvm.errno.tbaa (#201375)

For `!llvm.errno.tbaa`, emit TBAA for accessing the member of a virtual
`__libc_errno` struct. The purpose is to indicate that errno aliases
with `int` accesses, but not `int` member accesses in other structs.

This is an alternative to
https://github.com/llvm/llvm-project/pull/200367.
DeltaFile
+517-517clang/test/CodeGen/attr-counted-by.c
+405-405clang/test/Headers/__clang_hip_math.hip
+263-263clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
+276-160clang/test/CodeGen/tbaa-pointers.c
+220-53clang/test/CodeGen/tbaa.c
+131-131clang/test/CodeGen/allow-ubsan-check.c
+1,812-1,52935 files not shown
+2,933-2,57641 files

LLVM/project 024f789lldb CMakeLists.txt

[lldb][Windows] Use (lib)python3.dll when linking with limited API (#206585)

In release builds, we already link liblldb against `(lib)python3.dll`
(#201407). However, we still defined
`LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME` to `(lib)python3(.)xx.dll`. So
`LoadPythonRuntime` will try to load the version specific library. You
can reproduce this when trying to run a release build with a different
Python version in PATH.

With this PR, `LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME` will use the stable
ABI library name if we use the limited Python API.
DeltaFile
+15-5lldb/CMakeLists.txt
+15-51 files

LLVM/project d8d7e52llvm/include/llvm/IR Instructions.h

Add Load/Store Properties section
DeltaFile
+4-0llvm/include/llvm/IR/Instructions.h
+4-01 files

LLVM/project 4b802fellvm/lib/Target/AMDGPU VOPDInstructions.td SIInstrInfo.td, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp

[AMDGPU] Use sparse direct lookup table for VOPD eligibility (#206534)

This replaces a manually generated table with a new TableGen feature
that enables sparse direct lookup. Additional changes are made to put
both X and Y eligibility into a single table.

Assisted-by: Claude Code
DeltaFile
+26-24llvm/lib/Target/AMDGPU/VOPDInstructions.td
+11-30llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+8-15llvm/lib/Target/AMDGPU/SIInstrInfo.td
+45-693 files

LLVM/project b8d7413clang/lib/Driver OffloadBundler.cpp, clang/test/Driver clang-offload-bundler-magic-collision.c

Revert "[OffloadBundler] Bound compressed bundles by header size, not magic scan (#205587)" (#206678)

Causes test failures on big-endian.

This reverts commit d2d80787af6e1d671563654171f34e8eec35fd1b.
This reverts commit bac7ca5f170f25517870e0876d710cc4226c3602.
DeltaFile
+0-104clang/test/Driver/Inputs/clang-offload-bundler-magic-collision.py
+4-37clang/lib/Driver/OffloadBundler.cpp
+0-36llvm/test/tools/llvm-objdump/Offloading/fatbin-magic-collision.test
+5-28llvm/lib/Object/OffloadBundle.cpp
+0-27clang/test/Driver/clang-offload-bundler-magic-collision.c
+0-0clang/test/Driver/Inputs/clang-offload-bundler-magic-collision.co
+9-2326 files

LLVM/project 46e376cclang/include/clang/APINotes APINotesReader.h APINotesWriter.h, clang/lib/APINotes APINotesReader.cpp APINotesWriter.cpp

[APINotes] Serialize function-like Where.Parameters (#204147)

This PR builds on #203227 by serializing function-like
`Where.Parameters` selectors into binary API notes.

The selector remains declaration-selection data, separate from
annotation payloads. Existing Sema paths still use name-only lookup and
keep legacy broad matching. Exact overload matching is left to the
follow-up Sema PR.

## Format

This bumps the API notes minor version because the global-function and
C++ method table key layout changes.

Function-like entries now use a shared binary key containing:

- parent context ID
- declaration name ID

    [59 lines not shown]
DeltaFile
+145-16clang/lib/APINotes/APINotesReader.cpp
+86-20clang/lib/APINotes/APINotesWriter.cpp
+103-1clang/lib/APINotes/APINotesFormat.h
+31-11clang/lib/APINotes/APINotesYAMLCompiler.cpp
+31-0clang/include/clang/APINotes/APINotesReader.h
+17-0clang/include/clang/APINotes/APINotesWriter.h
+413-485 files not shown
+448-5111 files

LLVM/project 816d3f2llvm/docs ReleaseNotes.md, llvm/lib/Target/ARM/AsmParser ARMAsmParser.cpp

[ARM] Allow predicated `subs pc, lr, #imm` in Thumb2 (#205751)

ARMAsmParser has a special case for this instruction that used the
instruction name unmodified, but this would include the condition code,
so if the instruction has one, the tblgen entry doesn't match. The
condition code is already added as a separate operand.

Check for `CarrySetting` so that the special case does not falsely match
on `sub pc, lr, #imm`, which is not valid in Thumb2.
DeltaFile
+4-4llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
+6-0llvm/test/MC/ARM/basic-thumb2-instructions.s
+5-0llvm/test/MC/ARM/thumb2-diagnostics.s
+1-0llvm/docs/ReleaseNotes.md
+16-44 files

LLVM/project c452e7ellvm/lib/CodeGen/GlobalISel InlineAsmLowering.cpp, llvm/test/CodeGen/AArch64/GlobalISel arm64-fallback.ll

[GISel][Inlineasm] Don't assert on multi-register inline asm inputs (#200612)

`lowerInlineAsm()` asserts that the number of registers allocated for an
input operand equals the number of source vregs, then separately bails
for `NumRegs > 1`. The assert is wrong: the counts legitimately differ
when a value is passed in a register pair/tuple (e.g. i128 in a RISC-V
"R" GPR pair, or i512 to an AArch64 ld64b operand), crashing
assertions-enabled builds instead of falling back to SelectionDAG.

Replace the assert and the `NumRegs > 1` check with a single guard
requiring exactly one source vreg in one register; anything else is
rejected so it falls back instead of asserting. The supported path is
unchanged.

https://godbolt.org/z/v6WTaYEsd
DeltaFile
+8-0llvm/test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
+7-0llvm/test/CodeGen/RISCV/GlobalISel/riscv-unsupported.ll
+1-5llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp
+16-53 files

LLVM/project 5026b32orc-rt/include/orc-rt StandaloneMachOUnwindInfoRegistrar.h, orc-rt/include/orc-rt/sps-ci StandaloneMachOUnwindInfoRegistrarSPSCI.h

[orc-rt] Add StandaloneMachOUnwindInfoRegistrar. (#206669)

StandaloneMachOUnwindInfoRegistrar provides methods and SPS-CI
allocation actions for registering and deregistering MachO unwind-info
sections (DWARF EH-frame and compact-unwind) via libunwind's
find-dynamic-unwind-sections APIs.

A Registration handle returned by enable() represents the connection
with libunwind; clients must keep it alive for the lifetime of their
Session, and its destructor releases the registration. Concurrent
registrations are reference-counted so multiple sessions can share a
single underlying libunwind hook.

Registered code ranges are stored in an interval map. Overlapping ranges
are rejected; lookups for an address outside any registered range return
no info, so libunwind falls back to its other lookup mechanisms safely.

A future MachO-Platform will provide integrated unwind-info registration
and should be preferred when available. This class will then remain

    [8 lines not shown]
DeltaFile
+234-0orc-rt/lib/executor/StandaloneMachOUnwindInfoRegistrar.cpp
+188-0orc-rt/unittests/StandaloneMachOUnwindInfoRegistrarTest.cpp
+132-0orc-rt/include/orc-rt/StandaloneMachOUnwindInfoRegistrar.h
+43-0orc-rt/lib/executor/sps-ci/StandaloneMachOUnwindInfoRegistrarSPSCI.cpp
+27-0orc-rt/include/orc-rt/sps-ci/StandaloneMachOUnwindInfoRegistrarSPSCI.h
+2-0orc-rt/lib/executor/CMakeLists.txt
+626-02 files not shown
+628-08 files

LLVM/project b792db9llvm/lib/Target/AMDGPU AMDGPUAttributor.cpp AMDGPUSubtarget.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[AMDGPU] Relocated getMaxNumWorkGroups function out of AMDGPUSubtarget (NFC) (#205636)
DeltaFile
+1-7llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+0-7llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+5-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+3-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+0-3llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
+1-1llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+10-181 files not shown
+11-197 files

LLVM/project 2311323llvm/test/CodeGen/WebAssembly/GlobalISel/instructions icmp.ll select.ll

[WebAssembly][GlobalISel] Implement integer comparisons and `G_SELECT` (#197257)

Adds legalization and tests for various integer comparison operations
(namely `G_ICMP` is legal, but also enable `lower` for some other ones),
as well as `G_SELECT`.

Split from #157161
DeltaFile
+277-0llvm/test/CodeGen/WebAssembly/GlobalISel/instructions/icmp.ll
+94-0llvm/test/CodeGen/WebAssembly/GlobalISel/instructions/select.ll
+89-0llvm/test/CodeGen/WebAssembly/GlobalISel/instructions/ucmp.ll
+81-0llvm/test/CodeGen/WebAssembly/GlobalISel/instructions/scmp.ll
+77-0llvm/test/CodeGen/WebAssembly/GlobalISel/instructions/umin.ll
+77-0llvm/test/CodeGen/WebAssembly/GlobalISel/instructions/umax.ll
+695-03 files not shown
+857-09 files