[LoopVectorize] Reland: Add metadata to distinguish vectorized loop body from scalar remainder (#194912)
Reland of #190258, reverted in #194901 due to a null-pointer dereference
when the scalar preheader is absent. Fixed by guarding the metadata
emission with a null check on `ScalarPH`. Also adds a dedicated test
(`vectorize-loop-kind-metadata.ll`) verifying the metadata is emitted
only when remarks are enabled.
---
Add two new loop metadata attributes — `llvm.loop.vectorize.body` and
`llvm.loop.vectorize.epilogue` — that the loop vectorizer sets on the
generated vector loop and epilogue loop respectively. The metadata is
only emitted when optimization remarks are enabled (`ORE->enabled()`),
so it has zero cost in normal compilation.
These enable downstream passes (LoopUnroll, WarnMissedTransforms) to
produce more precise optimization remarks. Instead of the generic "loop
not unrolled" warning on a source line that was vectorized, the unroller
[25 lines not shown]
Add middleware support for LIO ALUA HA
Wire up the middleware side of LIO ALUA high-availability: load
lio_ha.ko with per-node addresses on service start, manage the
4-row ALUA state table (MASTER/BACKUP × synced/not-synced) across
failover events, clean up STANDBY configfs on pool export, and
add pre-flight validation that targets have static initiator ACLs
before ALUA can be enabled.
[libc][docs] Add pwd.h POSIX header documentation (#186292)
Add YAML metadata for `pwd.h` listing all POSIX-mandated functions
(`endpwent`, `getpwent`, `getpwnam`, `getpwnam_r`, `getpwuid`,
`getpwuid_r`, `setpwent`). This header defines no macros per POSIX.
Add `pwd` to `index.rst` and `CMakeLists.txt` `docgen_list`.
Verified with `python3 docgen.py pwd.h` — generates valid RST with
correct POSIX links.
Partial fix for #122006
Co-authored-by: Jeff Bailey <jbailey at raspberryginger.com>
[Loads] Fix crash on mixed-address-space pointers in no-AA store check (#195256)
Fix crash on mixed-address-space pointers in no-AA store check.
`areNonOverlapSameBaseLoadAndStore` built `ConstantRanges` from `APInts`
sized by the load and store pointer index widths. When those widths
differ (AMDGPU's AS=0 vs AS=5), `ConstantRange::intersectWith` asserts.
Adds early return mirroring `BasicAA` path.
This can happen when `FindAvailableLoadedValue` is called without
`BatchAAResults`. The path with `BatchAAResults` already handles it.
This crash was observed in #190607, so it was reverted in #195135.
Attributor: Avoid double map lookup in updateAttrMap (#182666)
This will leave behind the map entry in the unchanged case,
but this seems to not matter. Could erase the newly inserted
entry if that happens, but that also doesn't seem to make a
difference.
[flang][OpenMP] Check conflicts between predetermined/explicit DSA (#194961)
Improve checks for loop iteration variables with predetermined DSA
appearing in DSA clauses. Show both the location of the variable in the
offending clause, and in the loop.
Make the checks a bit more accurate as well: only allow LINEAR clause on
SIMD construct with a single affected loop.
[AArch64][llvm] Tighten SYSP; don't disassemble invalid encodings
Tighten SYSP aliases, so that invalid encodings are disassembled
to `<unknown>`. This is because:
```
Cn is a 4-bit unsigned immediate, in the range 8 to 9
Cm is a 4-bit unsigned immediate, in the range 0 to 7
op1 is a 3-bit unsigned immediate, in the range 0 to 6
op2 is a 3-bit unsigned immediate, in the range 0 to 7
```
Ensure we check this when disassembling, and also constrain
tablegen for compile-time errors of invalid encodings.
Also adjust the testcases in `armv9-sysp-diagnostics.s` and
`llvm/test/MC/AArch64/armv9a-sysp.s` as they were invalid,
and added a few invalid (outside of range) SYSP-alikes to
test that `<unknown>` is printed
[VPlan] Compute the cost for vector icmp and fcmp (#193268)
Currently we don't account for the costs of vector compares, meaning
vplans that contain them will underestimate the cost, so this patch adds
the cost of vector compares to VPInstruction::computeCost. We also need
to recognise BranchOnTwoConds as using only the first lane, otherwise we
think compares that are used by it are vector compares.
[flang][OpenMP] Implement better GetOmpObjectList, NFC (#195171)
The current implementation lists all clauses that contain an
OmpObjectList, together with the means of extracting it. For a clause
that is not listed, it returns nullptr.
The new implementation traverses an AST node until it finds an
OmpObjectList, and when one isn't found, returns nullptr. This is
actually simpler and is independent of any changes to the AST.
clang: Fix broken --offload-arch arguments in tests (#195259)
These need to use 2 dashes. A single dash is interpreted as -o. Fix
this in all tests using the single dash, except one which appears
to be testing the behavior of warning on the misspelled argument.