[CodeGen] Add widening support for ISD::CTTZ_ELTS (#205841)
WidenVectorOperand had no handler forCTTZ_ELTS/
CTTZ_ELTS_ZERO_POISON, causing a fatal error when the input vector type
needed widening.
Add WidenVecOp_CttzElements which widens the input vector and pads the
extra lanes with all-ones, ensuring they do not contribute spurious
trailing zeros to the count. This follows the same pattern as the
existing
WidenVecOp_VP_CttzElements.
Assisted-by: Claude (Anthropic)
Pull up following revision(s) (requested by riastradh in ticket #1297):
sys/arch/vax/include/mcontext.h (patch)
from: sys/arch/vax/include/lwp_private.h: revision 1.2
vax __lwp_getprivate_fast: Fix asm constraints.
r0 is not clobbered; it is the output. So say so.
No volatile is needed here, and volatile would be wrong, because
calls to __lwp_getprivate_fast can be safely deleted if the result is
not used.
PR port-vax/60101: vax: __lwp_getprivate_fast() inline asm uses
GCC-specific register variable pattern, not portable
Pull up following revision(s) (requested by riastradh in ticket #353):
sys/arch/vax/include/lwp_private.h: revision 1.2
vax __lwp_getprivate_fast: Fix asm constraints.
r0 is not clobbered; it is the output. So say so.
No volatile is needed here, and volatile would be wrong, because
calls to __lwp_getprivate_fast can be safely deleted if the result is
not used.
PR port-vax/60101: vax: __lwp_getprivate_fast() inline asm uses
GCC-specific register variable pattern, not portable
[LFI][AArch64] Add guard elimination optimization (#204693)
This adds support for the guard elimination optimization to the AArch64
LFI rewriter. Redundant guards (`add x28, x27, wN, uxtw` instructions)
will be skipped when possible. See the LFI.rst documentation for an
example of the optimization.
[lit] Add stream-injectable run() core to builtin cat (#204711)
Pull cat's logic out into run(argv, stdin, stdout, stderr, cwd) so it
takes explicit streams instead of touching sys.std* directly. main()
just calls run() with the real process streams, so nothing changes for
the spawned-script path.
Needed before cat can run in-process inside the lit worker
Also switched file reads to raw bytes throughout, since the old
text-mode read + win32 msvcrt.setmode was only there for sys.stdout's
encoding, which doesn't apply once we pass in a binary stream directly.
Error messages still report the original filename, not the cwd-joined
path.
Signed-off-by: Prasoon Kumar <prasoonkumar054 at gmail.com>
[lit] Use provided streams in builtin diff (#204869)
We want to move the diff builtin to run in-process inside the lit
worker, instead of spawning a subprocess. The current implementation
talks to sys.stdin / stdout / stderr directly, so it can't be called
with different streams.
To fix this, pull diff's logic into run(argv, stdin, stdout, stderr,
cwd), which takes streams as arguments instead of reaching for sys.std*.
main() now just calls run() with the real process streams, so the
spawned-script path is unchanged.
This also makes the 'import util' dual-mode: lit.util when diff is
imported as part of the lit package, falling back to flat util for the
spawned script.
[AMDGPU] Fix regclass for a true16 pattern. NFCI. (#206513)
Add an EXTRACT_SUBREG to make it clear that the result of the pattern is
only the low 16 bits of the result of the V_BFI_B32. This does not seem
to affect codegen, presumably because we are lax about allowing COPY
between VGPR_16 and VGPR_32.
[libc] Implement CPU_{AND,OR,XOR,EQUAL}(_S)? macros (#205412)
This patch implements CPU_AND, CPU_OR, CPU_XOR, and CPU_EQUAL macros
(along with their _S variants) from sched.h.
The implementation follows existing patterns by adding internal entry
points (__sched_andcpuset, __sched_orcpuset, __sched_xorcpuset, and
__sched_cpuequal) that perform bitwise operations on cpu_set_t. For
__sched_cpuequal, I use inline_memcmp instead of a manual loop.
Assisted by Gemini.
[mlir] Fix StridedMemRefRankOf to check isStrided() (#201415)
StridedMemRefRankOf was equivalent to MemRefRankOf: it only applied
HasAnyRankOfPred and never HasStridesPred, so non-strided memref layouts
(e.g. multi-result affine maps) incorrectly passed ODS verification on
ops using this constraint (e.g. sparse_tensor.push_back).
The inBuffer of push_back uses StridedMemRefRankOf, which requires a
strided memref layout (HasStridesPred). A non-strided layout must be
rejected.
[LoopIdiom] Form memset on runtime-trip multi-store loops. (#206354)
For runtime trip counts, mayLoopAccessLocation cannot bound the size of
the access, which prevents forming memsets for loops with multiple
stores of the same value.
If all may-aliasing stores write the same value, we can still form
potentially overlapping memsets, as the order of the memsets or writing
the same location multiple times should not matter.
On a large C/C++ based corpus (32k modules), we form ~2% more memsets.
```
base patch
memsets formed 90,063 91,853 +1.99%
```
PR: https://github.com/llvm/llvm-project/pull/206354
[mlir][bufferization] Add static memory planner pass for compile-time buffer allocation (#205125)
This PR introduces a new bufferization-related pass that performs static memory
planning at compile time. The pass is part of my GSoC 2026 project on
improving MLIR's buffer allocation strategies:
https://summerofcode.withgoogle.com/programs/2026/projects/XsjxBQ9o
### What this does
The static memory planner analyzes buffer lifetimes within a function
and consolidates multiple small `memref.alloc`/`memref.dealloc` pairs
into a single arena allocation. Instead of making separate heap
allocations for each memref, we compute offsets ahead of time and carve
out slices from one large buffer using `memref.view`.
This is useful for embedded systems and other memory-constrained
environments where you want predictable memory usage without runtime
allocation overhead.
[40 lines not shown]
[Clang] Emit struct TBAA for llvm.errno.tbaa (#201375)
For `!llvm.errno.tbaa`, emit TBAA for accessing the member of a virtual
`__libc_errno` struct. The purpose is to indicate that errno aliases
with `int` accesses, but not `int` member accesses in other structs.
This is an alternative to
https://github.com/llvm/llvm-project/pull/200367.
[lldb][Windows] Use (lib)python3.dll when linking with limited API (#206585)
In release builds, we already link liblldb against `(lib)python3.dll`
(#201407). However, we still defined
`LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME` to `(lib)python3(.)xx.dll`. So
`LoadPythonRuntime` will try to load the version specific library. You
can reproduce this when trying to run a release build with a different
Python version in PATH.
With this PR, `LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME` will use the stable
ABI library name if we use the limited Python API.