[LoopFusion][docs][NFC] Document atomic accesses as a fusion blocker (#201775)
Loops containing atomic accesses are now rejected outright, mirroring
the volatile blocker. Update the eligibility sections to match.
[RISCV][MC] Add experimental `Zvvmtls` and `Zvvmttls` support (#198229)
This patch adds experimental MC-layer support for the [RISC-V Integrated
Matrix
Extension](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-71c48b9-2026-05-17),
specifically the tile load/store extensions: `Zvvmtls` and `Zvvmttls`
This PR:
- Adds the optional tile lambda operand syntax (`L1` through `L64`), and
related asm operand.
- Adds the `vmtl.v`, `vmts.v`, `vmttl.v` and `vmtts.v` instructions to
the MC
- Modifies `parseMaskReg` to return `NoMatch` to allow overloaded
mnemonics to continue matching alternative optional operands, such as
parsing `vmtl.v v8, (a0), a1, L4` as the tile-lambda form instead of
failing by treating `L4` as a malformed mask operand. Real mask
registers missing .t, such as v0, still produce the existing diagnostic.
[mlirbc] Add AffineMap serialization support (#191970)
Add binary bytecode encoding for AffineMapAttr, replacing the textual fallback.
AffineMap is encoded as numDims, numSymbols, numResults, followed by the result
expressions. Where each expression, AffineExpr, is encoded in the general case
as a recursive/prefix tree with a VarInt kind tag followed by kind-specific
data. To guard a bit more against malformed bytecode it uses an iterative
parser for these.
Special case encoding for common case AffineMap's (required less space & easy
to create without much higher maintenance needs). The ordering of the enum
serialized differs from AffineExprKind as the latter has an expansion point in
the middle (new kinds can be added there) while the serialized encoding needs
to remain stable.
Updated the checked in mlirbc file as memref has a default affinemap, so
updating it pre snap.
Assisted-by: Antigravity : Gemini
[lldb][test] Increase polling in TestInterruptThreadNames.py (#201554)
This test runs for a very long time on my machine (11s per variation),
and nearly all of this time is spent on the 10s sleep in this function.
There are two issues here:
1. It uses the (now outdated) logic that arm64 means we have a remote
Darwin device. This is no longer true these days as Macs also run on
arm64.
2. The polling duration of 1s is still very long, and the test will
still spend all its time just waiting for this 1s sleep. A 100ms sleep
that we poll in a loop should be slow enough.
[lldb][test] Assume clang supports -gmodules (#201333)
We currently spend 50ms in most dotest invocations to check if clang
supports `-gmodules`. The expensive part of this check is creating the
clang process to run `clang --help`.
`-gmodules` was added 11 years ago and is present in any compiler that
has even a remote chance in supporting the rest of our test suite. This
patch just assumes that our compiler supports -gmodules if it is clang.
[lldb][test] Increase polling frequency in ProcessAttach (#201532)
The test_attach_to_process_by_id_correct_executable_offset subtest
requires us to hit a breakpoint in an attached process. For this we
implement a loop that hits the breakpoint location every 2 seconds.
This patch increases the rate at which we hit this breakpoint to 50ms.
The reason is that a 2s interval means that this test is waiting on any
fast system for nearly 2 seconds on the first breakpoint hit. With a
50ms interval this subtest passed immediately.
[lldb][test] Make TestInterruptThreadNames not depend on debug info (#201553)
This test only reads the pthread names, which don't depend on any debug
info.
This halves the runtime of this very long test from 22s to 11s.
[AMDGPU] In `LowerDYNAMIC_STACKALLOC`, hoist the `readfirstlane` up one instruction (#201528)
Instead of:
```
$max_size_vgpr = wave_reduction_umax($vgpr_alloca_size)
$sgpr_newsp = readfirstlane($max_size_vgpr + $sgpr_sp)
```
Hoist the readfirstlane up to perform the addition using scalar
registers:
```
$max_size_sgpr = readfirstlane(wave_reduction_umax($vgpr_alloca_size))
$sgpr_newsp = $max_size_sgpr + $sgpr_sp
```
[libc++] Drop transitive includes by default (#195509)
This patch removes the unused transitive includes by default.
`_LIBCPP_KEEP_TRANSITIVE_INCLUDES_LLVM23` can be defined to keep the
transitive includes around for an easier transition. The macro will be
removed in LLVM 24.
This patch implements
https://discourse.llvm.org/t/rfc-remove-unused-transitive-includes-from-the-libc-headers/90157
[offload][OpenMP] Fix record replay when no memory is used
Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
[mlir] Fix crash in test type converter for 1->N result conversion (#201738)
Use `results.append` instead of `results.assign`, preserving previous
results.
Fixes https://github.com/llvm/llvm-project/issues/201521
py-ruff: updated to 0.15.16
0.15.16
Preview features
[flake8-async] Implement yield-in-context-manager-in-async-generator (ASYNC119)
[pylint] Narrow diagnostic range and exclude cases without exception handlers (PLW0717)
[ruff] Treat yield before break from a terminal loop as terminal (RUF075)
Bug fixes
[eradicate] Avoid flagging ruff:ignore comments as code (ERA001)
[eradicate] Fix ERA001/RUF100 conflict when noqa is on commented-out code
[pyflakes] Avoid removing the format call when it would change behavior (F523)
[pylint] Avoid syntax errors in invalid character replacements in f-strings before Python 3.12 (PLE2510, PLE2512, PLE2513, PLE2514, PLE2515)
[pyupgrade] Avoid converting format calls with more kinds of side effects (UP032)
Rule changes
[16 lines not shown]
[X86] X86FixupInstTuning - fold VPERM2x128 -> VINSERTx128 when shuffling lower xmm half ymm sources (#201618)
VINSERTx128 is never slower than VPERM2x128 and notably quicker on some
targets (btver2, znver1, e-cores, etc.).
Shuffle lowering avoids some VINSERT patterns for AVX targets as it can
affect folding/commutation - but by the time we get to the fixup passes,
these are all done and we can safely convert to VINSERTF128/I128.
There's more variants of the VPERM2 immediate mask that could be folded,
but its incredibly difficult to hit them as its easily commutable.
I hit this while working on #199445.
py-apsw: updated to 3.53.2.0
3.53.2.0
Reflects changes and updates in SQLite extra. The sqlite3_scrub binary has been removed - use VACUUM INTO instead.
[SeparateConstOffsetFromGEP] Decompose xor constant operand when possible (#195830)
It may be desirable to fold constants directly into the addressing mode
when computing an address. While lowering GEPs and looking for a
constant to extract among the indexes, take into account constants which
are xor expressions as well. When some bits of the constant operand of
the xor are known-zero in the base operand, then, for those specific
bits (disjoint bits), xor and additions behave alike. Such bits may be
extracted from the xor, and are those that can contribute to the final
GEP offset.
Proofs: https://alive2.llvm.org/ce/z/JtmXsu.
Co-authored-by: Sumanth Gundapaneni <sumanth.gundapaneni at amd.com>