clang: Use correct triple when constructing offload bundler command
Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
[X86] combineMinMaxReduction - match any minmax reduction to a legal scalar (#195261)
Further relaxation of the VECREDUCE minmax folds - recognise any
>=128-bit vector that reduces to a legal scalar.
Adds custom ops for more VECREDUCE types (inc vXi32/vXi64 types)
Remaining issues are sub-128-bit vector reductions (unnecessary padding
with neutral elements) and handling of i64 types on 32-bit targets - I'm
working on fixes for both of these.
ARM: Support OR and XOR in targetShrinkDemandedConstant (#165106)
Also some changes for AND, OR, and XOR, including not changing if it
is legal. This prevents overcorrection to the mask, especially if it
is valid already.
[AArch64] Use add_like for SABD and UABA (#194421)
Similar to the other patches that have made use of add_like recent, this
adds add_like to the SABA and UABA patterns.
[AMDGPU] Enable runtime loop unrolling (#194924)
Enable auto runtime unrolling for AMDGPU by setting `UP.Runtime = true`
in `getUnrollingPreferences`, with `PartialThreshold = Threshold / 4` to
limit code-size growth.
Benchmarked on **MI350X (gfx950)** and **MI300X (gfx942)** using
Composable Kernel, xpu-perf, and llama.cpp. Results showed some some
improvements and no real regressions.
AI Disclaimer: Cursor was used to evaluate the change and run
benchmarking experiments.
[LoopVectorize] Reland: Add metadata to distinguish vectorized loop body from scalar remainder (#194912)
Reland of #190258, reverted in #194901 due to a null-pointer dereference
when the scalar preheader is absent. Fixed by guarding the metadata
emission with a null check on `ScalarPH`. Also adds a dedicated test
(`vectorize-loop-kind-metadata.ll`) verifying the metadata is emitted
only when remarks are enabled.
---
Add two new loop metadata attributes — `llvm.loop.vectorize.body` and
`llvm.loop.vectorize.epilogue` — that the loop vectorizer sets on the
generated vector loop and epilogue loop respectively. The metadata is
only emitted when optimization remarks are enabled (`ORE->enabled()`),
so it has zero cost in normal compilation.
These enable downstream passes (LoopUnroll, WarnMissedTransforms) to
produce more precise optimization remarks. Instead of the generic "loop
not unrolled" warning on a source line that was vectorized, the unroller
[25 lines not shown]
Add middleware support for LIO ALUA HA
Wire up the middleware side of LIO ALUA high-availability: load
lio_ha.ko with per-node addresses on service start, manage the
4-row ALUA state table (MASTER/BACKUP × synced/not-synced) across
failover events, clean up STANDBY configfs on pool export, and
add pre-flight validation that targets have static initiator ACLs
before ALUA can be enabled.
[libc][docs] Add pwd.h POSIX header documentation (#186292)
Add YAML metadata for `pwd.h` listing all POSIX-mandated functions
(`endpwent`, `getpwent`, `getpwnam`, `getpwnam_r`, `getpwuid`,
`getpwuid_r`, `setpwent`). This header defines no macros per POSIX.
Add `pwd` to `index.rst` and `CMakeLists.txt` `docgen_list`.
Verified with `python3 docgen.py pwd.h` — generates valid RST with
correct POSIX links.
Partial fix for #122006
Co-authored-by: Jeff Bailey <jbailey at raspberryginger.com>
[Loads] Fix crash on mixed-address-space pointers in no-AA store check (#195256)
Fix crash on mixed-address-space pointers in no-AA store check.
`areNonOverlapSameBaseLoadAndStore` built `ConstantRanges` from `APInts`
sized by the load and store pointer index widths. When those widths
differ (AMDGPU's AS=0 vs AS=5), `ConstantRange::intersectWith` asserts.
Adds early return mirroring `BasicAA` path.
This can happen when `FindAvailableLoadedValue` is called without
`BatchAAResults`. The path with `BatchAAResults` already handles it.
This crash was observed in #190607, so it was reverted in #195135.
Attributor: Avoid double map lookup in updateAttrMap (#182666)
This will leave behind the map entry in the unchanged case,
but this seems to not matter. Could erase the newly inserted
entry if that happens, but that also doesn't seem to make a
difference.
[flang][OpenMP] Check conflicts between predetermined/explicit DSA (#194961)
Improve checks for loop iteration variables with predetermined DSA
appearing in DSA clauses. Show both the location of the variable in the
offending clause, and in the loop.
Make the checks a bit more accurate as well: only allow LINEAR clause on
SIMD construct with a single affected loop.