[CodeGen] Add widening support for ISD::CTTZ_ELTS (#205841)
WidenVectorOperand had no handler forCTTZ_ELTS/
CTTZ_ELTS_ZERO_POISON, causing a fatal error when the input vector type
needed widening.
Add WidenVecOp_CttzElements which widens the input vector and pads the
extra lanes with all-ones, ensuring they do not contribute spurious
trailing zeros to the count. This follows the same pattern as the
existing
WidenVecOp_VP_CttzElements.
Assisted-by: Claude (Anthropic)
[LFI][AArch64] Add guard elimination optimization (#204693)
This adds support for the guard elimination optimization to the AArch64
LFI rewriter. Redundant guards (`add x28, x27, wN, uxtw` instructions)
will be skipped when possible. See the LFI.rst documentation for an
example of the optimization.
[lit] Add stream-injectable run() core to builtin cat (#204711)
Pull cat's logic out into run(argv, stdin, stdout, stderr, cwd) so it
takes explicit streams instead of touching sys.std* directly. main()
just calls run() with the real process streams, so nothing changes for
the spawned-script path.
Needed before cat can run in-process inside the lit worker
Also switched file reads to raw bytes throughout, since the old
text-mode read + win32 msvcrt.setmode was only there for sys.stdout's
encoding, which doesn't apply once we pass in a binary stream directly.
Error messages still report the original filename, not the cwd-joined
path.
Signed-off-by: Prasoon Kumar <prasoonkumar054 at gmail.com>
[lit] Use provided streams in builtin diff (#204869)
We want to move the diff builtin to run in-process inside the lit
worker, instead of spawning a subprocess. The current implementation
talks to sys.stdin / stdout / stderr directly, so it can't be called
with different streams.
To fix this, pull diff's logic into run(argv, stdin, stdout, stderr,
cwd), which takes streams as arguments instead of reaching for sys.std*.
main() now just calls run() with the real process streams, so the
spawned-script path is unchanged.
This also makes the 'import util' dual-mode: lit.util when diff is
imported as part of the lit package, falling back to flat util for the
spawned script.
[AMDGPU] Fix regclass for a true16 pattern. NFCI. (#206513)
Add an EXTRACT_SUBREG to make it clear that the result of the pattern is
only the low 16 bits of the result of the V_BFI_B32. This does not seem
to affect codegen, presumably because we are lax about allowing COPY
between VGPR_16 and VGPR_32.
[libc] Implement CPU_{AND,OR,XOR,EQUAL}(_S)? macros (#205412)
This patch implements CPU_AND, CPU_OR, CPU_XOR, and CPU_EQUAL macros
(along with their _S variants) from sched.h.
The implementation follows existing patterns by adding internal entry
points (__sched_andcpuset, __sched_orcpuset, __sched_xorcpuset, and
__sched_cpuequal) that perform bitwise operations on cpu_set_t. For
__sched_cpuequal, I use inline_memcmp instead of a manual loop.
Assisted by Gemini.
[mlir] Fix StridedMemRefRankOf to check isStrided() (#201415)
StridedMemRefRankOf was equivalent to MemRefRankOf: it only applied
HasAnyRankOfPred and never HasStridesPred, so non-strided memref layouts
(e.g. multi-result affine maps) incorrectly passed ODS verification on
ops using this constraint (e.g. sparse_tensor.push_back).
The inBuffer of push_back uses StridedMemRefRankOf, which requires a
strided memref layout (HasStridesPred). A non-strided layout must be
rejected.
[LoopIdiom] Form memset on runtime-trip multi-store loops. (#206354)
For runtime trip counts, mayLoopAccessLocation cannot bound the size of
the access, which prevents forming memsets for loops with multiple
stores of the same value.
If all may-aliasing stores write the same value, we can still form
potentially overlapping memsets, as the order of the memsets or writing
the same location multiple times should not matter.
On a large C/C++ based corpus (32k modules), we form ~2% more memsets.
```
base patch
memsets formed 90,063 91,853 +1.99%
```
PR: https://github.com/llvm/llvm-project/pull/206354
[mlir][bufferization] Add static memory planner pass for compile-time buffer allocation (#205125)
This PR introduces a new bufferization-related pass that performs static memory
planning at compile time. The pass is part of my GSoC 2026 project on
improving MLIR's buffer allocation strategies:
https://summerofcode.withgoogle.com/programs/2026/projects/XsjxBQ9o
### What this does
The static memory planner analyzes buffer lifetimes within a function
and consolidates multiple small `memref.alloc`/`memref.dealloc` pairs
into a single arena allocation. Instead of making separate heap
allocations for each memref, we compute offsets ahead of time and carve
out slices from one large buffer using `memref.view`.
This is useful for embedded systems and other memory-constrained
environments where you want predictable memory usage without runtime
allocation overhead.
[40 lines not shown]
[Clang] Emit struct TBAA for llvm.errno.tbaa (#201375)
For `!llvm.errno.tbaa`, emit TBAA for accessing the member of a virtual
`__libc_errno` struct. The purpose is to indicate that errno aliases
with `int` accesses, but not `int` member accesses in other structs.
This is an alternative to
https://github.com/llvm/llvm-project/pull/200367.
[lldb][Windows] Use (lib)python3.dll when linking with limited API (#206585)
In release builds, we already link liblldb against `(lib)python3.dll`
(#201407). However, we still defined
`LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME` to `(lib)python3(.)xx.dll`. So
`LoadPythonRuntime` will try to load the version specific library. You
can reproduce this when trying to run a release build with a different
Python version in PATH.
With this PR, `LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME` will use the stable
ABI library name if we use the limited Python API.
[AMDGPU] Use sparse direct lookup table for VOPD eligibility (#206534)
This replaces a manually generated table with a new TableGen feature
that enables sparse direct lookup. Additional changes are made to put
both X and Y eligibility into a single table.
Assisted-by: Claude Code
[APINotes] Serialize function-like Where.Parameters (#204147)
This PR builds on #203227 by serializing function-like
`Where.Parameters` selectors into binary API notes.
The selector remains declaration-selection data, separate from
annotation payloads. Existing Sema paths still use name-only lookup and
keep legacy broad matching. Exact overload matching is left to the
follow-up Sema PR.
## Format
This bumps the API notes minor version because the global-function and
C++ method table key layout changes.
Function-like entries now use a shared binary key containing:
- parent context ID
- declaration name ID
[59 lines not shown]
[ARM] Allow predicated `subs pc, lr, #imm` in Thumb2 (#205751)
ARMAsmParser has a special case for this instruction that used the
instruction name unmodified, but this would include the condition code,
so if the instruction has one, the tblgen entry doesn't match. The
condition code is already added as a separate operand.
Check for `CarrySetting` so that the special case does not falsely match
on `sub pc, lr, #imm`, which is not valid in Thumb2.
[GISel][Inlineasm] Don't assert on multi-register inline asm inputs (#200612)
`lowerInlineAsm()` asserts that the number of registers allocated for an
input operand equals the number of source vregs, then separately bails
for `NumRegs > 1`. The assert is wrong: the counts legitimately differ
when a value is passed in a register pair/tuple (e.g. i128 in a RISC-V
"R" GPR pair, or i512 to an AArch64 ld64b operand), crashing
assertions-enabled builds instead of falling back to SelectionDAG.
Replace the assert and the `NumRegs > 1` check with a single guard
requiring exactly one source vreg in one register; anything else is
rejected so it falls back instead of asserting. The supported path is
unchanged.
https://godbolt.org/z/v6WTaYEsd
[orc-rt] Add StandaloneMachOUnwindInfoRegistrar. (#206669)
StandaloneMachOUnwindInfoRegistrar provides methods and SPS-CI
allocation actions for registering and deregistering MachO unwind-info
sections (DWARF EH-frame and compact-unwind) via libunwind's
find-dynamic-unwind-sections APIs.
A Registration handle returned by enable() represents the connection
with libunwind; clients must keep it alive for the lifetime of their
Session, and its destructor releases the registration. Concurrent
registrations are reference-counted so multiple sessions can share a
single underlying libunwind hook.
Registered code ranges are stored in an interval map. Overlapping ranges
are rejected; lookups for an address outside any registered range return
no info, so libunwind falls back to its other lookup mechanisms safely.
A future MachO-Platform will provide integrated unwind-info registration
and should be preferred when available. This class will then remain
[8 lines not shown]
[WebAssembly][GlobalISel] Implement integer comparisons and `G_SELECT` (#197257)
Adds legalization and tests for various integer comparison operations
(namely `G_ICMP` is legal, but also enable `lower` for some other ones),
as well as `G_SELECT`.
Split from #157161