[mlir-c] Reapply Add ConversionTarget dynamic legality C API (#207104) (#207253)
Fixes LeakSanitizer failure from #206161 (reverted in #207104);
`mlirFreezeRewritePattern` moves contents out of the `RewritePatternSet`
but does not free the container (passed by value in the C API), so the
allocation from `mlirRewritePatternSetCreate` was never freed (add
`mlirRewritePatternSetDestroy(patterns)` after freezing).
[libc++] Base string's alignment on __STDCPP_DEFAULT_NEW_ALIGNMENT__ (#171785)
This allows users to influence how much we overalign `string`s
allocations and tune it to the new/delete implementation via
`-fnew-alignment`. If we don't have `__STDCPP_DEFAULT_NEW_ALGINMENT__`
or we're not using `std::allocator`, we default to an alignment of
`sizeof(void*)`.
[ELF] Precompute orphan output section names in parallel. NFC (#207321)
addOrphanSections computes getOutputSectionName serially for every live
orphan section. Without --emit-relocs/-r, the name is a pure function of
the section: precompute the names with a parallelFor.
[DAGCombiner] Reassociate chains of vector reductions (#206471)
`DAGCombiner::reassociateReduction` already folds a single
`add(vecreduce(x), vecreduce(y)) -> vecreduce(add(x, y))`, and the
balanced-tree form `add(add(vecreduce(a), b), add(vecreduce(c), d))`.
It does not, however, handle a linear chain of reductions like the one
SLP emits for x264's SAD:
```
add(reduce(X0), add(reduce(X1), add(reduce(X2), acc)))
```
Only the innermost pair can ever be merged; the cascade breaks and every
reduction survives to lowering, giving one `vredsum` (or one `uadalp` step,
etc.) per term.
This PR adds a third form to `reassociateReduction`:
```
[17 lines not shown]
[clang][ssaf] Add `MultiArchSharedLibrary` data structure (#206854)
This change introduces `MultiArchSharedLibrary` data structure that wraps per-architecture `LUSummaryEncoding` members. This is the SSAF analogue of a fat shared library. The overall design mirrors the existing `MultiArchStaticLibrary` design: each member identifies the same logical library built for a different target triple. Support for constructing and consuming this object will be added in a future PR.
rdar://181164537
[RISCV] Cost legal interleaved memory ops correctly for code size (#207162)
This doesn't yet handle interleaved memory ops with a factor > 8 or with
a gap mask, that still needs to be handled below.
[ELF] Parallelize demoteSymbolsAndComputeIsPreemptible (#207310)
Each symbol's demotion and isPreemptible bit is independent.
Linking clang release is 1.02x as fast on an x86-64 machine.
lit: improve long path support on Windows (#207250)
This pull request improves Windows path handling in the
`llvm/utils/lit/lit` utilities by introducing and applying an `extended`
function to correctly format file paths for Windows APIs, especially for
long paths and UNC paths. The changes ensure that file operations such
as removal and redirection work reliably on Windows systems.
**Windows path handling improvements:**
* Added an `extended` function in both `InprocBuiltins.py` and
`ShellEnvironment.py` to convert paths to the extended-length format
required by Windows, handling both regular and UNC paths.
[[1]](diffhunk://#diff-7b75d403cff61cebbd12ef3915054dee6a887deaa2300fbc73a33f64ce2d1255R179-R186)
[[2]](diffhunk://#diff-31c539a1c64eb53261e543eeda1966733230d2b7613f5d500deed3f2f1ce2baeR121-R128)
* Applied the `extended` function to file removal operations in
`InprocBuiltins.py`, ensuring paths are properly formatted before
deletion, which helps avoid issues with long or special Windows paths.
* Used the `extended` function for redirected file paths in
`ShellEnvironment.py`, ensuring that redirections to files handle
Windows path limitations correctly.
[RISCV] Fix multiline RUN line in fixed-vectors-lmul-max.ll. NFC (#207309)
llvm-lit failed to parse the RUN line because we were missing a \ and a
second
RUN on the line below. The codegen has changed in the meantime but
because it
never parsed, llvm-lit always treated this test as passing.
[libc++] Use _LIBCPP_NO_UNIQUE_ADDRESS for the new vector layout (#207149)
We use `_LIBCPP_NO_UNIQUE_ADDRESS`, since a plain
`[[no_unique_address]]` doesn't work on Windows.
[LegalizeType] Fix VECTOR_DEINTERLEAVE widening with incorrect insert_subvector (#207245)
Partially address #207136
There are really two parts in the associated issue: (1) incorrect type
widening logics that `insert_subvector` with indices that are not a
multiple of the sub-vector's minimum number of elements, and (2)
incorrect RISC-V lowering logics when it comes to fixed vector.
This PR addresses the first part: It turns out in order to have a
widened, packed concat vector, we don't need to use any insert_subvector
that involves widened operands -- just `concat_vectors` on the
_original_ (narrow) operands (before adjusting to the size of the
desired widened concat vector)
[AArch64] Lower cttz(bitcast <Nxi1> to iN) with shrn-based compressed movemask (#199081)
The existing lowering in vectorToScalarBitmask() creates a 1 bit per
lane movemask using a powers of 2 reduction (and+addv with a constant
pool entry).
This patch adds a DAG combine on ISD::CTTZ that recognizes cttz(bitcast
<N x i1> to iN) and produces a compressed movemask with shrn (for i8
lanes) or xtn (for wider lanes) then runs scalar cttz on a 64- or
128-bit value. Dividing by bits per lane gives the lane index.
Supports lane counts {2, 4, 8, 16, 32} (one or two NEON registers)
For the example in the issue (`<16 x i8> -> i16`):
Before:
```asm
adrp x8, .LCPI0_0
cmlt v0.16b, v0.16b, #0
[34 lines not shown]
[PGO][HIP][NFC] Fix hipModuleGetGlobal -Wunused-function warning (#207293)
The functions trigger the warning on Windows (without elf.h) and is
fatal under -Werror.
Fix by adding [[maybe_unused]]. Alternatively it could be moved inside
the existing __has_include(<elf.h>) block,; however that would trigger
-Wunused-but-set-global on pHipModuleGetGlobal.
Current fix is minimal and can be removed once hipModuleGetGlobal is
supported without elf.h.
[clang] fix redeclarations of the injected class name
The declaration used to represent an injected class name should never
be part of any redeclaration chain.
This is a regression since Clang 22, and this will be backported, so no release notes.
Fixes #202320
[clangd] Invalidate preamble when new module imports are added (#199460)
When using `SkipPreambleBuild`, adding a new `import` statement to a
file did
not invalidate the existing preamble because `isPreambleCompatible` only
checked whether existing prerequisite modules were up-to-date, not
whether
the set of required modules itself had changed.
Fixes: #199389
Partially addresses: #126350
[flang-rt] Use posix_memalign instead of std::aligned_alloc (#207248)
MallocWrapper called std::aligned_alloc for over-aligned requests, but
that C11 function is only available on macOS 10.15 and newer. flang-rt
builds with a Darwin deployment target of 10.7 (set in
AddFlangRT.cmake), so the build failed under
-Werror=unguarded-availability-new.
Use posix_memalign instead, as it is available on all supported POSIX
targets.