LLVM/project a948225llvm/docs AMDGPUUsage.rst, llvm/docs/AMDGPU DeveloperGuideline.rst

[NFC][AMDGPU][Doc] Add developer guideline

This guideline covers topics on top of existing LLVM guideline.
DeltaFile
+405-0llvm/docs/AMDGPU/DeveloperGuideline.rst
+1-0llvm/docs/AMDGPUUsage.rst
+406-02 files

LLVM/project 899ab68llvm/lib/Target/RISCV RISCVCodeGenPrepare.cpp, llvm/test/CodeGen/RISCV/rvv vreductions-int.ll

[RISCV] Optimize scalable multiply reductions lowerings (#193528)

Now that we can correctly lower a scalable multiply reduction, lets
improve the lowering for two common cases.
1) If we have exact VLEN, just convert to the fixed vector
   form and lower so that we get shuffles instead of the
   slow painful loop.
2) Handle the high LMUL case by splitting down to m1 via
   a reduce tree.  It's only at the final stage that we need
   to use the loop to handle the unknown number of elements.

For context, this is mostly for completeness. I don't plan on going any
further to improve the code quality here. There's more we can do (e.g.
exploiting minimum element count to use shuffles for the first couple
stages), but that can be future work.

Code written by Claude with heavy guidance and review by me.
DeltaFile
+499-28llvm/test/CodeGen/RISCV/rvv/vreductions-int.ll
+66-24llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
+565-522 files

LLVM/project d1cea91llvm/lib/Transforms/Utils LoopUnroll.cpp, llvm/test/Transforms/LoopUnroll/branch-weights-freq unroll-complete.ll unroll-epilog.ll

[LoopUnroll] Fix freqs for unconditional latches: N>2, fast (#182404)

This patch extends PR #179520 to the N > 2 case, where N is the number
of remaining conditional latches. Its strategy is to apply the original
loop's probability to all N latches and then, as needed, adjust as few
of them as possible.
DeltaFile
+480-0llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
+168-7llvm/lib/Transforms/Utils/LoopUnroll.cpp
+68-0llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
+66-0llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
+782-74 files

LLVM/project de99eefclang/lib/Driver/ToolChains HIPUtility.cpp

clang: Use correct triple when constructing offload bundler command

Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
DeltaFile
+5-3clang/lib/Driver/ToolChains/HIPUtility.cpp
+5-31 files

LLVM/project 67bf019clang/lib/Driver/ToolChains HIPUtility.cpp

clang: Use correct triple when constructing offload bundler command

Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
DeltaFile
+5-3clang/lib/Driver/ToolChains/HIPUtility.cpp
+5-31 files

LLVM/project 3b2d2ealldb/test/Shell/Commands command-disassemble-aarch64-color.s command-disassemble-aarch64-extensions.s, llvm/lib/Target/AArch64 AArch64InstrFormats.td

fixup! Update diff because SYSP definition has changed
DeltaFile
+126-114llvm/test/MC/AArch64/armv9a-sysp.s
+19-21llvm/test/MC/AArch64/armv9-sysp-diagnostics.s
+7-32llvm/lib/Target/AArch64/AArch64InstrFormats.td
+2-11llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+2-2lldb/test/Shell/Commands/command-disassemble-aarch64-color.s
+2-2lldb/test/Shell/Commands/command-disassemble-aarch64-extensions.s
+158-1822 files not shown
+159-1878 files

LLVM/project 1adabd6clang/www cxx_dr_status.html

[clang][NFC] Update the DR list (#195213)

Just ran `make_cxx_dr_status` to bring in the latest DRs.
DeltaFile
+55-6clang/www/cxx_dr_status.html
+55-61 files

LLVM/project c46ed70clang/test/CXX/drs cwg27xx.cpp, clang/www cxx_dr_status.html

[clang][NFC] Mark CWG2780 as implemented and add a test (#195127)

[CWG2780](https://wg21.link/cwg2780) allows `reinterpret_cast`ing to a
reference-to-function type. Clang already supports this:
https://godbolt.org/z/c37hsvKnn
DeltaFile
+10-0clang/test/CXX/drs/cwg27xx.cpp
+1-1clang/www/cxx_dr_status.html
+11-12 files

LLVM/project b2f0db3llvm/lib/Target/ARM ARMISelLowering.cpp

[ARM] Remove redundant setOperationAction (NFC) (#194241)

We already set it earlier in the block.
DeltaFile
+0-3llvm/lib/Target/ARM/ARMISelLowering.cpp
+0-31 files

LLVM/project a3afdb1clang/lib/Driver/ToolChains HIPUtility.cpp

clang: Use correct triple when constructing offload bundler command

Use the toolchain triple for the particular input instead of the top
level toolchain. NFC for now, but avoids mismatched triples in
a future change.
DeltaFile
+3-3clang/lib/Driver/ToolChains/HIPUtility.cpp
+3-31 files

LLVM/project aee55a8llvm/test/CodeGen/X86 vector-reduce-umin.ll vector-reduce-umax.ll

[X86] combineMinMaxReduction - match any minmax reduction to a legal scalar (#195261)

Further relaxation of the VECREDUCE minmax folds - recognise any
>=128-bit vector that reduces to a legal scalar.

Adds custom ops for more VECREDUCE types (inc vXi32/vXi64 types)

Remaining issues are sub-128-bit vector reductions (unnecessary padding
with neutral elements) and handling of i64 types on 32-bit targets - I'm
working on fixes for both of these.
DeltaFile
+532-602llvm/test/CodeGen/X86/vector-reduce-umin.ll
+521-588llvm/test/CodeGen/X86/vector-reduce-umax.ll
+340-398llvm/test/CodeGen/X86/vector-reduce-smin.ll
+333-389llvm/test/CodeGen/X86/vector-reduce-smax.ll
+245-311llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
+245-310llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
+2,216-2,5985 files not shown
+2,420-2,88211 files

LLVM/project 83ecdb7llvm/lib/Target/ARM ARMISelLowering.cpp, llvm/test/CodeGen/ARM sdiv-pow2-thumb-size.ll sdiv-pow2-arm-size.ll

ARM: Support OR and XOR in targetShrinkDemandedConstant (#165106)

Also some changes for AND, OR, and XOR, including not changing if it
is legal. This prevents overcorrection to the mask, especially if it
is valid already.
DeltaFile
+135-64llvm/lib/Target/ARM/ARMISelLowering.cpp
+110-39llvm/test/CodeGen/ARM/sdiv-pow2-thumb-size.ll
+52-32llvm/test/CodeGen/ARM/sdiv-pow2-arm-size.ll
+3-7llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
+2-4llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll
+2-2llvm/test/CodeGen/ARM/fpenv.ll
+304-1484 files not shown
+309-15510 files

LLVM/project 7df6714llvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/test/CodeGen/AArch64 neon-saba.ll arm64-neon-aba-abd.ll

[AArch64] Use add_like for SABD and UABA (#194421)

Similar to the other patches that have made use of add_like recent, this
adds add_like to the SABA and UABA patterns.
DeltaFile
+24-33llvm/test/CodeGen/AArch64/neon-saba.ll
+12-24llvm/test/CodeGen/AArch64/arm64-neon-aba-abd.ll
+13-13llvm/lib/Target/AArch64/AArch64InstrInfo.td
+49-703 files

LLVM/project b09174bllvm/lib/Target/AMDGPU AMDGPUTargetTransformInfo.cpp, llvm/test/Transforms/LoopUnroll/AMDGPU runtime-unroll.ll

[AMDGPU] Enable runtime loop unrolling (#194924)

Enable auto runtime unrolling for AMDGPU by setting `UP.Runtime = true`
in `getUnrollingPreferences`, with `PartialThreshold = Threshold / 4` to
limit code-size growth.

Benchmarked on **MI350X (gfx950)** and **MI300X (gfx942)** using
Composable Kernel, xpu-perf, and llama.cpp. Results showed some some
improvements and no real regressions.

AI Disclaimer: Cursor was used to evaluate the change and run
benchmarking experiments.
DeltaFile
+63-0llvm/test/Transforms/LoopUnroll/AMDGPU/runtime-unroll.ll
+5-2llvm/test/Transforms/SimpleLoopUnswitch/AMDGPU/uniform-unswitch.ll
+4-1llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+72-33 files

LLVM/project 8bb6419clang/test/Analysis analyzeOneFunction.cpp, clang/test/Analysis/ctu on-demand-parsing.c on-demand-parsing.cpp

[TEST-ONLY] Make sure clang-extdef-mapping don't infer database (#195196)
DeltaFile
+1-1clang/test/Tooling/clang-extdef-mapping.cpp
+1-1clang/test/Analysis/analyzeOneFunction.cpp
+1-1clang/test/Analysis/ctu/on-demand-parsing.c
+1-1clang/test/Analysis/ctu/on-demand-parsing.cpp
+4-44 files

LLVM/project f86d707llvm/docs LangRef.rst TransformMetadata.rst, llvm/lib/Transforms/Scalar WarnMissedTransforms.cpp

[LoopVectorize] Reland: Add metadata to distinguish vectorized loop body from scalar remainder (#194912)

Reland of #190258, reverted in #194901 due to a null-pointer dereference
when the scalar preheader is absent. Fixed by guarding the metadata
emission with a null check on `ScalarPH`. Also adds a dedicated test
(`vectorize-loop-kind-metadata.ll`) verifying the metadata is emitted
only when remarks are enabled.

---

Add two new loop metadata attributes — `llvm.loop.vectorize.body` and
`llvm.loop.vectorize.epilogue` — that the loop vectorizer sets on the
generated vector loop and epilogue loop respectively. The metadata is
only emitted when optimization remarks are enabled (`ORE->enabled()`),
so it has zero cost in normal compilation.

These enable downstream passes (LoopUnroll, WarnMissedTransforms) to
produce more precise optimization remarks. Instead of the generic "loop
not unrolled" warning on a source line that was vectorized, the unroller

    [25 lines not shown]
DeltaFile
+120-0llvm/test/Transforms/LoopTransformWarning/vectorizer-loop-kind-unroll-warning.ll
+110-0llvm/test/Transforms/LoopUnroll/vectorizer-loop-kind-remarks.ll
+42-0llvm/docs/LangRef.rst
+34-0llvm/test/Transforms/LoopVectorize/vectorize-loop-kind-metadata.ll
+21-0llvm/docs/TransformMetadata.rst
+10-3llvm/lib/Transforms/Scalar/WarnMissedTransforms.cpp
+337-36 files not shown
+380-912 files

LLVM/project bcaaf61libc/docs CMakeLists.txt, libc/docs/headers index.rst

[libc][docs] Add pwd.h POSIX header documentation (#186292)

Add YAML metadata for `pwd.h` listing all POSIX-mandated functions
(`endpwent`, `getpwent`, `getpwnam`, `getpwnam_r`, `getpwuid`,
`getpwuid_r`, `setpwent`). This header defines no macros per POSIX.

Add `pwd` to `index.rst` and `CMakeLists.txt` `docgen_list`.

Verified with `python3 docgen.py pwd.h` — generates valid RST with
correct POSIX links.

Partial fix for #122006

Co-authored-by: Jeff Bailey <jbailey at raspberryginger.com>
DeltaFile
+15-0libc/utils/docgen/pwd.yaml
+1-0libc/docs/CMakeLists.txt
+1-0libc/docs/headers/index.rst
+17-03 files

LLVM/project f3d0ac9llvm/lib/Analysis Loads.cpp, llvm/unittests/Analysis LoadsTest.cpp

[Loads] Fix crash on mixed-address-space pointers in no-AA store check (#195256)

Fix crash on mixed-address-space pointers in no-AA store check.
`areNonOverlapSameBaseLoadAndStore` built `ConstantRanges` from `APInts`
sized by the load and store pointer index widths. When those widths
differ (AMDGPU's AS=0 vs AS=5), `ConstantRange::intersectWith` asserts.
Adds early return mirroring `BasicAA` path.

This can happen when `FindAvailableLoadedValue` is called without
`BatchAAResults`. The path with `BatchAAResults` already handles it.

This crash was observed in #190607, so it was reverted in #195135.
DeltaFile
+29-0llvm/unittests/Analysis/LoadsTest.cpp
+2-0llvm/lib/Analysis/Loads.cpp
+31-02 files

LLVM/project 4fa459dllvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp, llvm/test/MC/AArch64 armv9a-tlbip.s

fixup! Address Carol's PR comments
DeltaFile
+5-0llvm/test/MC/AArch64/armv9a-tlbip.s
+3-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+8-02 files

LLVM/project 6900ebellvm/lib/Transforms/IPO Attributor.cpp

Attributor: Avoid double map lookup in updateAttrMap (#182666)

This will leave behind the map entry in the unchanged case,
but this seems to not matter. Could erase the newly inserted
entry if that happens, but that also doesn't seem to make a
difference.
DeltaFile
+7-8llvm/lib/Transforms/IPO/Attributor.cpp
+7-81 files

LLVM/project ed25e1fllvm/lib/Transforms/Vectorize VPlanUtils.cpp VPlan.h

[VPlan] Add missing const-qualifications (NFC) (#195248)
DeltaFile
+4-4llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+2-2llvm/lib/Transforms/Vectorize/VPlan.h
+1-1llvm/lib/Transforms/Vectorize/VPlanUtils.h
+7-73 files

LLVM/project a6a53ealibc/docs CMakeLists.txt, libc/docs/headers index.rst

[libc][docs] Add sched.h POSIX header documentation (#186290)

Add YAML metadata for `sched.h` listing all POSIX-mandated macros
(`SCHED_FIFO`, `SCHED_OTHER`, `SCHED_RR`, `SCHED_SPORADIC`) and
functions
(`sched_get_priority_max`, `sched_get_priority_min`, `sched_getparam`,
`sched_getscheduler`, `sched_rr_get_interval`, `sched_setparam`,
`sched_setscheduler`, `sched_yield`).

Add `sched` to `index.rst` and `CMakeLists.txt` `docgen_list`.

Verified with `python3 docgen.py sched.h` — generates valid RST with
correct POSIX links.

Partial fix for #122006

Co-authored-by: Jeff Bailey <jbailey at raspberryginger.com>
DeltaFile
+27-0libc/utils/docgen/sched.yaml
+1-0libc/docs/CMakeLists.txt
+1-0libc/docs/headers/index.rst
+29-03 files

LLVM/project c549abaflang/lib/Semantics check-omp-loop.cpp openmp-utils.cpp, flang/test/Parser/OpenMP linear-clause.f90

[flang][OpenMP] Check conflicts between predetermined/explicit DSA (#194961)

Improve checks for loop iteration variables with predetermined DSA
appearing in DSA clauses. Show both the location of the variable in the
offending clause, and in the loop.

Make the checks a bit more accurate as well: only allow LINEAR clause on
SIMD construct with a single affected loop.
DeltaFile
+86-22flang/lib/Semantics/check-omp-loop.cpp
+27-0llvm/include/llvm/Frontend/OpenMP/OMP.h
+11-14flang/test/Parser/OpenMP/linear-clause.f90
+18-3flang/test/Semantics/OpenMP/do01.f90
+14-6flang/lib/Semantics/openmp-utils.cpp
+0-15flang/lib/Semantics/check-omp-structure.cpp
+156-608 files not shown
+173-9114 files

LLVM/project e7db558libc/docs CMakeLists.txt, libc/docs/headers index.rst

[libc][docs] Add spawn.h POSIX header documentation (#186291)

Add YAML metadata for `spawn.h` listing all POSIX-mandated macros
(`POSIX_SPAWN_RESETIDS`, `POSIX_SPAWN_SETPGROUP`,
`POSIX_SPAWN_SETSCHEDPARAM`,
`POSIX_SPAWN_SETSCHEDULER`, `POSIX_SPAWN_SETSID`,
`POSIX_SPAWN_SETSIGDEF`,
`POSIX_SPAWN_SETSIGMASK`) and all 23 functions (`posix_spawn`,
`posix_spawnp`,
`posix_spawn_file_actions_*`, `posix_spawnattr_*`).

Add `spawn` to `index.rst` and `CMakeLists.txt` `docgen_list`.

Verified with `python3 docgen.py spawn.h` — generates valid RST with
correct POSIX links.

Partial fix for #122006
DeltaFile
+63-0libc/utils/docgen/spawn.yaml
+1-0libc/docs/headers/index.rst
+1-0libc/docs/CMakeLists.txt
+65-03 files

LLVM/project d7c907allvm/lib/Target/AArch64/MCTargetDesc AArch64InstPrinter.cpp

fixup! Address PR comment about shortened `sysp` with xzr/xzr
DeltaFile
+17-16llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+17-161 files

LLVM/project f35bcd8llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp, llvm/test/MC/AArch64 armv9-sysp-diagnostics.s

fixup! Improve error parsing
DeltaFile
+46-25llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+12-12llvm/test/MC/AArch64/armv9-sysp-diagnostics.s
+58-372 files

LLVM/project 33a7f77llvm/lib/Target/AArch64 AArch64RegisterInfo.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Implement Marian's suggestion to implement as XSeqPairsClass + [XZR, XZR]
DeltaFile
+54-82llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+35-73llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+12-9llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
+8-1llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+0-7llvm/test/MC/AArch64/armv9a-sysp.s
+1-3llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.h
+110-1756 files

LLVM/project 1421144llvm/lib/Target/AArch64 AArch64InstrFormats.td

fixup! Remove superfluous code
DeltaFile
+0-7llvm/lib/Target/AArch64/AArch64InstrFormats.td
+0-71 files

LLVM/project 5990d03llvm/lib/Target/AArch64/MCTargetDesc AArch64InstPrinter.cpp, llvm/test/MC/AArch64 armv9a-sysp.s

fixup! Add no-alias tests
DeltaFile
+7-0llvm/test/MC/AArch64/armv9a-sysp.s
+4-3llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+11-32 files

LLVM/project 331697dllvm/lib/Target/AArch64 AArch64InstrFormats.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Templatise bounds checking and improve tests
DeltaFile
+15-4llvm/test/MC/AArch64/armv9-sysp-diagnostics.s
+18-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+12-5llvm/lib/Target/AArch64/AArch64InstrFormats.td
+0-8llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+45-174 files