[DAGCombiner] Reassociate chains of vector reductions (#206471)
`DAGCombiner::reassociateReduction` already folds a single
`add(vecreduce(x), vecreduce(y)) -> vecreduce(add(x, y))`, and the
balanced-tree form `add(add(vecreduce(a), b), add(vecreduce(c), d))`.
It does not, however, handle a linear chain of reductions like the one
SLP emits for x264's SAD:
```
add(reduce(X0), add(reduce(X1), add(reduce(X2), acc)))
```
Only the innermost pair can ever be merged; the cascade breaks and every
reduction survives to lowering, giving one `vredsum` (or one `uadalp` step,
etc.) per term.
This PR adds a third form to `reassociateReduction`:
```
[17 lines not shown]
[clang][ssaf] Add `MultiArchSharedLibrary` data structure (#206854)
This change introduces `MultiArchSharedLibrary` data structure that wraps per-architecture `LUSummaryEncoding` members. This is the SSAF analogue of a fat shared library. The overall design mirrors the existing `MultiArchStaticLibrary` design: each member identifies the same logical library built for a different target triple. Support for constructing and consuming this object will be added in a future PR.
rdar://181164537
[RISCV] Cost legal interleaved memory ops correctly for code size (#207162)
This doesn't yet handle interleaved memory ops with a factor > 8 or with
a gap mask, that still needs to be handled below.
[ELF] Parallelize demoteSymbolsAndComputeIsPreemptible (#207310)
Each symbol's demotion and isPreemptible bit is independent.
Linking clang release is 1.02x as fast on an x86-64 machine.
lit: improve long path support on Windows (#207250)
This pull request improves Windows path handling in the
`llvm/utils/lit/lit` utilities by introducing and applying an `extended`
function to correctly format file paths for Windows APIs, especially for
long paths and UNC paths. The changes ensure that file operations such
as removal and redirection work reliably on Windows systems.
**Windows path handling improvements:**
* Added an `extended` function in both `InprocBuiltins.py` and
`ShellEnvironment.py` to convert paths to the extended-length format
required by Windows, handling both regular and UNC paths.
[[1]](diffhunk://#diff-7b75d403cff61cebbd12ef3915054dee6a887deaa2300fbc73a33f64ce2d1255R179-R186)
[[2]](diffhunk://#diff-31c539a1c64eb53261e543eeda1966733230d2b7613f5d500deed3f2f1ce2baeR121-R128)
* Applied the `extended` function to file removal operations in
`InprocBuiltins.py`, ensuring paths are properly formatted before
deletion, which helps avoid issues with long or special Windows paths.
* Used the `extended` function for redirected file paths in
`ShellEnvironment.py`, ensuring that redirections to files handle
Windows path limitations correctly.
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull BPF fixes from Daniel Borkmann:
- Initialize task local storage before fork bails out to free the task
(Jann Horn)
- Fix insn_aux_data leak on verifier error path (KaFai Wan)
- Reject BPF inode storage map creation when BPF LSM is uninitialized
(Matt Bobrowski)
- Mask pseudo pointer values in verifier logs when pointer leaks are
not allowed (Nuoqi Gui)
- Harden BPF JIT against spraying via IBPB flush (Pawan Gupta)
- Reject a skb-modifying SK_SKB stream parser since the latter is only
meant to measure the next message (Sechang Lim)
[21 lines not shown]
[RISCV] Fix multiline RUN line in fixed-vectors-lmul-max.ll. NFC (#207309)
llvm-lit failed to parse the RUN line because we were missing a \ and a
second
RUN on the line below. The codegen has changed in the meantime but
because it
never parsed, llvm-lit always treated this test as passing.
[libc++] Use _LIBCPP_NO_UNIQUE_ADDRESS for the new vector layout (#207149)
We use `_LIBCPP_NO_UNIQUE_ADDRESS`, since a plain
`[[no_unique_address]]` doesn't work on Windows.
[LegalizeType] Fix VECTOR_DEINTERLEAVE widening with incorrect insert_subvector (#207245)
Partially address #207136
There are really two parts in the associated issue: (1) incorrect type
widening logics that `insert_subvector` with indices that are not a
multiple of the sub-vector's minimum number of elements, and (2)
incorrect RISC-V lowering logics when it comes to fixed vector.
This PR addresses the first part: It turns out in order to have a
widened, packed concat vector, we don't need to use any insert_subvector
that involves widened operands -- just `concat_vectors` on the
_original_ (narrow) operands (before adjusting to the size of the
desired widened concat vector)
Merge tag 'vfio-v7.2-rc2' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
"Mostly straightforward fixes here, inconsistent runtime PM handling
due to global device policies, bitfield races, unwind path gaps,
teardown ordering, and a misplaced library flag.
- Fix racy bitfield updates in vfio-pci-core and the mlx5 vfio-pci
variant driver with a binary split between setup/release and
runtime modified flags. These were noted across several Sashiko
reviews as pre-existing issues (Alex Williamson)
- Fix runtime PM inconsistency where the vfio-pci driver module_init
could modify the idle PM policy of existing devices through globals
managed in vfio-pci-core, leading to unbalanced runtime PM
operations (Alex Williamson)
- Restore mutability of writable vfio-pci module options by further
pulling policy globals out of vfio-pci-core, to instead be latched
[23 lines not shown]
[AArch64] Lower cttz(bitcast <Nxi1> to iN) with shrn-based compressed movemask (#199081)
The existing lowering in vectorToScalarBitmask() creates a 1 bit per
lane movemask using a powers of 2 reduction (and+addv with a constant
pool entry).
This patch adds a DAG combine on ISD::CTTZ that recognizes cttz(bitcast
<N x i1> to iN) and produces a compressed movemask with shrn (for i8
lanes) or xtn (for wider lanes) then runs scalar cttz on a 64- or
128-bit value. Dividing by bits per lane gives the lane index.
Supports lane counts {2, 4, 8, 16, 32} (one or two NEON registers)
For the example in the issue (`<16 x i8> -> i16`):
Before:
```asm
adrp x8, .LCPI0_0
cmlt v0.16b, v0.16b, #0
[34 lines not shown]
[PGO][HIP][NFC] Fix hipModuleGetGlobal -Wunused-function warning (#207293)
The functions trigger the warning on Windows (without elf.h) and is
fatal under -Werror.
Fix by adding [[maybe_unused]]. Alternatively it could be moved inside
the existing __has_include(<elf.h>) block,; however that would trigger
-Wunused-but-set-global on pHipModuleGetGlobal.
Current fix is minimal and can be removed once hipModuleGetGlobal is
supported without elf.h.
[clang] fix redeclarations of the injected class name
The declaration used to represent an injected class name should never
be part of any redeclaration chain.
This is a regression since Clang 22, and this will be backported, so no release notes.
Fixes #202320
[clangd] Invalidate preamble when new module imports are added (#199460)
When using `SkipPreambleBuild`, adding a new `import` statement to a
file did
not invalidate the existing preamble because `isPreambleCompatible` only
checked whether existing prerequisite modules were up-to-date, not
whether
the set of required modules itself had changed.
Fixes: #199389
Partially addresses: #126350
Do not panic when frame_length > ETHER_MAX_LEN, reset the chip instead
There is no need to panic when RX FIFO desync occurred or garbage frame
arrived. We can recover by resetting the chip, so do that. It's the
same recovery path the driver already used for a bad avail marker.
Do not unload bounce buffer dmamap on error during DMA read/write
Discovered when hacking on jzmmc.
The two functions: sdmmc_mem_single_segment_dma_write_block and
sdmmc_mem_single_segment_dma_read_block are not the owners of bounce
buffer dmamap and have no business in unloading it.
This caused bus_dmamap_sync: bad offset panic during DMA on non-coherent
CPU cores.
Note that this particular code path (bounce buffers) is generally not
well exercised on mainstream platforms, which caused the bug to get
unnoticed.