[lldb-dap] Add more logs when running in server mode (#195249)
Had a recent failure in server mode but could not find the reason for
the failure.
```
[09:26:05.247] lldb-dap.cpp:552 started with connection listeners connection://[127.0.0.1]:34035
[09:26:05.258] lldb-dap.cpp:587 (conn0) client connected
[09:26:05.263] lldb-dap.cpp:630 server shutting down, disconnecting remaining clients
[09:26:05.263] (conn0) <-- {"event":"terminated","seq":1,"type":"event"}
[09:26:05.263] DAP.cpp:1005 (conn0) transport closed
[09:26:05.272] lldb-dap.cpp:609 (conn0) client disconnected
```
clang: Use correct triple when constructing offload bundler command
Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
[ARM] Remove unneeded AND node, now that ARM does not do UndefinedBooleanContent anymore (NFC) (#194253)
It has ZeroOrOneBooleanContent now, so it is optimized away in all cases
anyway.
[RISCV] Optimize scalable multiply reductions lowerings (#193528)
Now that we can correctly lower a scalable multiply reduction, lets
improve the lowering for two common cases.
1) If we have exact VLEN, just convert to the fixed vector
form and lower so that we get shuffles instead of the
slow painful loop.
2) Handle the high LMUL case by splitting down to m1 via
a reduce tree. It's only at the final stage that we need
to use the loop to handle the unknown number of elements.
For context, this is mostly for completeness. I don't plan on going any
further to improve the code quality here. There's more we can do (e.g.
exploiting minimum element count to use shuffles for the first couple
stages), but that can be future work.
Code written by Claude with heavy guidance and review by me.
[LoopUnroll] Fix freqs for unconditional latches: N>2, fast (#182404)
This patch extends PR #179520 to the N > 2 case, where N is the number
of remaining conditional latches. Its strategy is to apply the original
loop's probability to all N latches and then, as needed, adjust as few
of them as possible.
clang: Use correct triple when constructing offload bundler command
Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
clang: Use correct triple when constructing offload bundler command
Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
[clang][NFC] Mark CWG2780 as implemented and add a test (#195127)
[CWG2780](https://wg21.link/cwg2780) allows `reinterpret_cast`ing to a
reference-to-function type. Clang already supports this:
https://godbolt.org/z/c37hsvKnn
clang: Use correct triple when constructing offload bundler command
Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
[X86] combineMinMaxReduction - match any minmax reduction to a legal scalar (#195261)
Further relaxation of the VECREDUCE minmax folds - recognise any
>=128-bit vector that reduces to a legal scalar.
Adds custom ops for more VECREDUCE types (inc vXi32/vXi64 types)
Remaining issues are sub-128-bit vector reductions (unnecessary padding
with neutral elements) and handling of i64 types on 32-bit targets - I'm
working on fixes for both of these.
ARM: Support OR and XOR in targetShrinkDemandedConstant (#165106)
Also some changes for AND, OR, and XOR, including not changing if it
is legal. This prevents overcorrection to the mask, especially if it
is valid already.
[AArch64] Use add_like for SABD and UABA (#194421)
Similar to the other patches that have made use of add_like recent, this
adds add_like to the SABA and UABA patterns.
[AMDGPU] Enable runtime loop unrolling (#194924)
Enable auto runtime unrolling for AMDGPU by setting `UP.Runtime = true`
in `getUnrollingPreferences`, with `PartialThreshold = Threshold / 4` to
limit code-size growth.
Benchmarked on **MI350X (gfx950)** and **MI300X (gfx942)** using
Composable Kernel, xpu-perf, and llama.cpp. Results showed some some
improvements and no real regressions.
AI Disclaimer: Cursor was used to evaluate the change and run
benchmarking experiments.
[LoopVectorize] Reland: Add metadata to distinguish vectorized loop body from scalar remainder (#194912)
Reland of #190258, reverted in #194901 due to a null-pointer dereference
when the scalar preheader is absent. Fixed by guarding the metadata
emission with a null check on `ScalarPH`. Also adds a dedicated test
(`vectorize-loop-kind-metadata.ll`) verifying the metadata is emitted
only when remarks are enabled.
---
Add two new loop metadata attributes — `llvm.loop.vectorize.body` and
`llvm.loop.vectorize.epilogue` — that the loop vectorizer sets on the
generated vector loop and epilogue loop respectively. The metadata is
only emitted when optimization remarks are enabled (`ORE->enabled()`),
so it has zero cost in normal compilation.
These enable downstream passes (LoopUnroll, WarnMissedTransforms) to
produce more precise optimization remarks. Instead of the generic "loop
not unrolled" warning on a source line that was vectorized, the unroller
[25 lines not shown]
[libc][docs] Add pwd.h POSIX header documentation (#186292)
Add YAML metadata for `pwd.h` listing all POSIX-mandated functions
(`endpwent`, `getpwent`, `getpwnam`, `getpwnam_r`, `getpwuid`,
`getpwuid_r`, `setpwent`). This header defines no macros per POSIX.
Add `pwd` to `index.rst` and `CMakeLists.txt` `docgen_list`.
Verified with `python3 docgen.py pwd.h` — generates valid RST with
correct POSIX links.
Partial fix for #122006
Co-authored-by: Jeff Bailey <jbailey at raspberryginger.com>
[Loads] Fix crash on mixed-address-space pointers in no-AA store check (#195256)
Fix crash on mixed-address-space pointers in no-AA store check.
`areNonOverlapSameBaseLoadAndStore` built `ConstantRanges` from `APInts`
sized by the load and store pointer index widths. When those widths
differ (AMDGPU's AS=0 vs AS=5), `ConstantRange::intersectWith` asserts.
Adds early return mirroring `BasicAA` path.
This can happen when `FindAvailableLoadedValue` is called without
`BatchAAResults`. The path with `BatchAAResults` already handles it.
This crash was observed in #190607, so it was reverted in #195135.
Attributor: Avoid double map lookup in updateAttrMap (#182666)
This will leave behind the map entry in the unchanged case,
but this seems to not matter. Could erase the newly inserted
entry if that happens, but that also doesn't seem to make a
difference.
[flang][OpenMP] Check conflicts between predetermined/explicit DSA (#194961)
Improve checks for loop iteration variables with predetermined DSA
appearing in DSA clauses. Show both the location of the variable in the
offending clause, and in the loop.
Make the checks a bit more accurate as well: only allow LINEAR clause on
SIMD construct with a single affected loop.