[LoopFusion][docs][NFC] Document atomic accesses as a fusion blocker (#201775)
Loops containing atomic accesses are now rejected outright, mirroring
the volatile blocker. Update the eligibility sections to match.
libde265: updated to 1.1.1
1.1.1
The decoding speed has been improved by about 8% on x86 CPUs thanks to more SIMD acceleration and optimized CABAC code. Also the startup time has been improved, which gives a 3% speed improvement when decoding HEIC files with similar-sized tiles.
Build differences
When building shared-libraries in Release mode, we are now using -fvisibility=hidden by default. You can override this with the new cmake option "FORCE_FULL_VISIBILITY".
Security
CVE TBD (GHSA-ccfw-29x7-rrx3) - Pixel accessor signed integer overflow causes heap OOB read/write
CVE TBD (GHSA-j2qq-x2xq-g9wr) - SAO sequential filter heap buffer overflow via signed integer overflow
haproxy: updated to 3.4.0
3.4.0
- BUG/MINOR: tcpcheck: Check LDAP response to not read more data than available
- BUG/MINOR: ssl-gencert: validate SNI characters to prevent SAN certificate injection
- BUG/MINOR: mux-h1: H2 preface rejection doesn't update stick-table glitches
- BUG/MEDIUM: cpu-topo: Enforce thread-hard-limit on policy
- BUG/MEDIUM: qmux: do not crash on too large record
- BUG/MEDIUM: qmux: do not crash on receiving an invalid first frame
- BUG/MINOR: qmux: reject too large initial record
- Revert "BUG/MEDIUM: dns: fix long loops in additional records parse on name failure"
- BUG/MINOR: qpack: Fix index calculation in debug functions
- BUG/MINOR: qpack: fix potential null-pointer dereference in qpack_dht_insert()
- CLEANUP: qpack: fix copy-paste typo in value Huffman debug string
- BUG/MINOR: qpack: fix sign bit mask in qpack_decode_fs_pfx()
- CLEANUP: qpack: fix copy-paste typo in value Huffman debug string for WLN
- BUG/MINOR: qpack: fix huff_dec() error handling in qpack_decode_fs()
- CLEANUP: qpack: move encoded macros to qpack-t.h to avoid duplication
- BUG/MEDIUM: quic: handle ECONNREFUSED on RX side
[76 lines not shown]
[RISCV][MC] Add experimental `Zvvmtls` and `Zvvmttls` support (#198229)
This patch adds experimental MC-layer support for the [RISC-V Integrated
Matrix
Extension](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-71c48b9-2026-05-17),
specifically the tile load/store extensions: `Zvvmtls` and `Zvvmttls`
This PR:
- Adds the optional tile lambda operand syntax (`L1` through `L64`), and
related asm operand.
- Adds the `vmtl.v`, `vmts.v`, `vmttl.v` and `vmtts.v` instructions to
the MC
- Modifies `parseMaskReg` to return `NoMatch` to allow overloaded
mnemonics to continue matching alternative optional operands, such as
parsing `vmtl.v v8, (a0), a1, L4` as the tile-lambda form instead of
failing by treating `L4` as a malformed mask operand. Real mask
registers missing .t, such as v0, still produce the existing diagnostic.
arm: relax coherent DMA ordering barriers from DSB to DMB
Use DMB instead of DSB for the ARM coherent DMA ordering macros dma_*_*()
The previous definitions used DSB, which enforces completion semantics and
is heavier than needed for coherent device DMA ordering. DMB provides ordering
of memory operations without requiring full completion, making it the
appropriate barrier for these coherent-only CPU/device DMA paths.
Tested on Fusion VM, Orion O6, and Thunderx.
There is an approximate 1% performance improvement for the Fusion VM, but
less for Orion O6 and Thunderx.
[mlirbc] Add AffineMap serialization support (#191970)
Add binary bytecode encoding for AffineMapAttr, replacing the textual fallback.
AffineMap is encoded as numDims, numSymbols, numResults, followed by the result
expressions. Where each expression, AffineExpr, is encoded in the general case
as a recursive/prefix tree with a VarInt kind tag followed by kind-specific
data. To guard a bit more against malformed bytecode it uses an iterative
parser for these.
Special case encoding for common case AffineMap's (required less space & easy
to create without much higher maintenance needs). The ordering of the enum
serialized differs from AffineExprKind as the latter has an expansion point in
the middle (new kinds can be added there) while the serialized encoding needs
to remain stable.
Updated the checked in mlirbc file as memref has a default affinemap, so
updating it pre snap.
Assisted-by: Antigravity : Gemini
[lldb][test] Increase polling in TestInterruptThreadNames.py (#201554)
This test runs for a very long time on my machine (11s per variation),
and nearly all of this time is spent on the 10s sleep in this function.
There are two issues here:
1. It uses the (now outdated) logic that arm64 means we have a remote
Darwin device. This is no longer true these days as Macs also run on
arm64.
2. The polling duration of 1s is still very long, and the test will
still spend all its time just waiting for this 1s sleep. A 100ms sleep
that we poll in a loop should be slow enough.
[lldb][test] Assume clang supports -gmodules (#201333)
We currently spend 50ms in most dotest invocations to check if clang
supports `-gmodules`. The expensive part of this check is creating the
clang process to run `clang --help`.
`-gmodules` was added 11 years ago and is present in any compiler that
has even a remote chance in supporting the rest of our test suite. This
patch just assumes that our compiler supports -gmodules if it is clang.
[lldb][test] Increase polling frequency in ProcessAttach (#201532)
The test_attach_to_process_by_id_correct_executable_offset subtest
requires us to hit a breakpoint in an attached process. For this we
implement a loop that hits the breakpoint location every 2 seconds.
This patch increases the rate at which we hit this breakpoint to 50ms.
The reason is that a 2s interval means that this test is waiting on any
fast system for nearly 2 seconds on the first breakpoint hit. With a
50ms interval this subtest passed immediately.
[lldb][test] Make TestInterruptThreadNames not depend on debug info (#201553)
This test only reads the pthread names, which don't depend on any debug
info.
This halves the runtime of this very long test from 22s to 11s.
[AMDGPU] In `LowerDYNAMIC_STACKALLOC`, hoist the `readfirstlane` up one instruction (#201528)
Instead of:
```
$max_size_vgpr = wave_reduction_umax($vgpr_alloca_size)
$sgpr_newsp = readfirstlane($max_size_vgpr + $sgpr_sp)
```
Hoist the readfirstlane up to perform the addition using scalar
registers:
```
$max_size_sgpr = readfirstlane(wave_reduction_umax($vgpr_alloca_size))
$sgpr_newsp = $max_size_sgpr + $sgpr_sp
```
[libc++] Drop transitive includes by default (#195509)
This patch removes the unused transitive includes by default.
`_LIBCPP_KEEP_TRANSITIVE_INCLUDES_LLVM23` can be defined to keep the
transitive includes around for an easier transition. The macro will be
removed in LLVM 24.
This patch implements
https://discourse.llvm.org/t/rfc-remove-unused-transitive-includes-from-the-libc-headers/90157
[offload][OpenMP] Fix record replay when no memory is used
Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
[mlir] Fix crash in test type converter for 1->N result conversion (#201738)
Use `results.append` instead of `results.assign`, preserving previous
results.
Fixes https://github.com/llvm/llvm-project/issues/201521
py-ruff: updated to 0.15.16
0.15.16
Preview features
[flake8-async] Implement yield-in-context-manager-in-async-generator (ASYNC119)
[pylint] Narrow diagnostic range and exclude cases without exception handlers (PLW0717)
[ruff] Treat yield before break from a terminal loop as terminal (RUF075)
Bug fixes
[eradicate] Avoid flagging ruff:ignore comments as code (ERA001)
[eradicate] Fix ERA001/RUF100 conflict when noqa is on commented-out code
[pyflakes] Avoid removing the format call when it would change behavior (F523)
[pylint] Avoid syntax errors in invalid character replacements in f-strings before Python 3.12 (PLE2510, PLE2512, PLE2513, PLE2514, PLE2515)
[pyupgrade] Avoid converting format calls with more kinds of side effects (UP032)
Rule changes
[16 lines not shown]
[X86] X86FixupInstTuning - fold VPERM2x128 -> VINSERTx128 when shuffling lower xmm half ymm sources (#201618)
VINSERTx128 is never slower than VPERM2x128 and notably quicker on some
targets (btver2, znver1, e-cores, etc.).
Shuffle lowering avoids some VINSERT patterns for AVX targets as it can
affect folding/commutation - but by the time we get to the fixup passes,
these are all done and we can safely convert to VINSERTF128/I128.
There's more variants of the VPERM2 immediate mask that could be folded,
but its incredibly difficult to hit them as its easily commutable.
I hit this while working on #199445.