LLVM/project 1792fcdclang/include/clang/Basic BuiltinsAMDGPU.td, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

[AMDGPU] Add builtins for wave reduction intrinsics

Assisted by - Claude-sonnet:4.6
DeltaFile
+189-0clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+18-0clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+9-0clang/include/clang/Basic/BuiltinsAMDGPU.td
+216-03 files

LLVM/project 2157ee5clang/test/Sema wave-reduce-builtins-validate-amdgpu.cl

Missing SEMA tests
DeltaFile
+26-0clang/test/Sema/wave-reduce-builtins-validate-amdgpu.cl
+26-01 files

LLVM/project df98d04llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.or.ll

[AMDGPU] Support Wave Reduction for i16 types - 3

Supported Ops: `and`, `or`, `xor`.
DeltaFile
+673-160llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+563-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+563-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.and.ll
+4-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+4-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+1,807-4345 files

LLVM/project 322f5d0llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] Support Wave Reduction for i16 types - 2 (#194810)

Supported Ops: `add`, `sub`.
DeltaFile
+692-187llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+668-184llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+6-2llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+6-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+1,372-3754 files

LLVM/project 651afa8lld/ELF Relocations.cpp, lld/test/ELF arm-thunk-overlay-reuse.s aarch64-thunk-bti-overlay-reuse.s

[LLD][ELF] Do not reuse thunks in OVERLAYs (#200415)

We cannot guarantee that a thunk in an OVERLAY will be in memory at the
same time as the caller if the caller is not in the same output section.
It is safe for a caller in an OVERLAY to reuse a thunk in a non-OVERLAY
section as we know that will be in memory. Thunks that are placed
before their target, are alternative entry points and can also be reused.

Resurrect the isThunkSectionCompatible function that was recently
removed as it served a similar purpose for thunks in different
partitions.

Potentially fixes #199966 which mentions a similar problem for sections
assigned to TCM (Tightly Coupled Memory). It should be possible to model
a TCM as an OVERLAY. If not then there may need to be a command-line
option to inhibit thunk sharing across output sections.
DeltaFile
+89-0lld/test/ELF/arm-thunk-overlay-reuse.s
+70-0lld/test/ELF/aarch64-thunk-bti-overlay-reuse.s
+18-1lld/ELF/Relocations.cpp
+177-13 files

LLVM/project e6a70acmlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/ConvertToEmitC ConvertToEmitCPass.cpp

Revert "[mlir][emitc] Lower multiple results as a struct (#200659)" (#202911)

This reverts commit 1e0a4c7a9154e46ef52a7c5b0ddbca69fbdcfacd.

Failed buildbot:
https://lab.llvm.org/buildbot/#/builders/116/builds/29302
DeltaFile
+25-236mlir/lib/Conversion/FuncToEmitC/FuncToEmitC.cpp
+2-96mlir/test/Conversion/FuncToEmitC/func-to-emitc.mlir
+1-87mlir/test/Conversion/FuncToEmitC/func-to-emitc-failed.mlir
+0-63mlir/test/Target/Cpp/func.mlir
+5-13mlir/lib/Conversion/ConvertToEmitC/ConvertToEmitCPass.cpp
+0-6mlir/include/mlir/Conversion/Passes.td
+33-5017 files not shown
+39-51213 files

LLVM/project ac0f040orc-rt/include/orc-rt NativeDylibManager.h, orc-rt/lib/executor NativeDylibManager.cpp

[orc-rt] Treat empty path as "process symbols" in NativeDylibManager. (#202905)

NativeDylibManager::load now handles an empty path by returning the
process's global lookup handle (RTLD_DEFAULT on POSIX) directly,
bypassing dlopen and the shutdown-time dlclose registration. This
matches the behavior of OrcTargetProcess's SimpleExecutorDylibManager.
DeltaFile
+20-0orc-rt/unittests/NativeDylibManagerTest.cpp
+17-0orc-rt/unittests/NativeDylibManagerSPSCITest.cpp
+4-6orc-rt/lib/executor/Unix/NativeDylibAPIs.inc
+5-0orc-rt/lib/executor/NativeDylibManager.cpp
+3-0orc-rt/include/orc-rt/NativeDylibManager.h
+49-65 files

LLVM/project 0345e7dllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.umax.ll llvm.amdgcn.reduce.umin.ll

[AMDGPU] Support Wave Reduction for i16 types - 1 (#194808)

Supported Ops: `min`, `umin`, `max`, `umax`.
16-bit wave reduce ops are promoted to 32-bit
operations before ISEL. From there they use the
existing implementations for 32-bit reductions.

Assisted by - Claude-sonnet:4.6
DeltaFile
+589-137llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
+562-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll
+528-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+528-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+52-21llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+21-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+2,280-5666 files

LLVM/project ded7a39llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.or.ll

[AMDGPU] Support Wave Reduction for i16 types - 3 (#194812)

Supported Ops: `and`, `or`, `xor`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
DeltaFile
+673-160llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+563-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+563-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.and.ll
+4-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+4-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1,807-4345 files

LLVM/project 15fdf79llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] Support Wave Reduction for i16 types - 2 (#194810)

Supported Ops: `add`, `sub`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
DeltaFile
+692-187llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+668-184llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+6-2llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+6-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+1,372-3754 files

LLVM/project 37931f6llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.umax.ll llvm.amdgcn.reduce.umin.ll

[AMDGPU] Support Wave Reduction for i16 types - 1 (#194808)

Supported Ops: `min`, `umin`, `max`, `umax`.
16-bit wave reduce ops are promoted to 32-bit
operations before ISEL. From there they use the
existing implementations for 32-bit reductions.

Assisted by - Claude-sonnet:4.6
DeltaFile
+589-137llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
+562-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll
+528-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+528-136llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+52-21llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+21-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+2,280-5666 files

LLVM/project 67fb966flang/include/flang/Semantics symbol.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][OpenMP] Unify groupprivate device_type handling with declare_target
DeltaFile
+25-24flang/lib/Semantics/resolve-directives.cpp
+11-15flang/lib/Lower/OpenMP/OpenMP.cpp
+9-12flang/lib/Semantics/mod-file.cpp
+12-8flang/lib/Semantics/symbol.cpp
+9-9flang/include/flang/Semantics/symbol.h
+66-685 files

LLVM/project 95fd258llvm/lib/Target/AMDGPU/Utils AMDGPUHWEvents.cpp

Comment
DeltaFile
+11-14llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.cpp
+11-141 files

LLVM/project 05df7bbllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUHWEvents.cpp AMDGPUHWEvents.h

[AMDGPU][InsertWaitCnts] Move HWEvent analysis code

Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.

This should be NFC.
DeltaFile
+164-0llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.cpp
+3-116llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+6-0llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.h
+173-1163 files

LLVM/project de477b6llvm/lib/Target/AMDGPU/Utils AMDGPUHWEvents.h

fmt
DeltaFile
+2-1llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.h
+2-11 files

LLVM/project 51fb7eellvm/lib/Target/SPIRV SPIRVCommandLine.cpp

[SPIR-V] Remove duplicate SPV_INTEL_int4 extension map entry (#202871)
DeltaFile
+0-1llvm/lib/Target/SPIRV/SPIRVCommandLine.cpp
+0-11 files

LLVM/project 658bcd6llvm/lib/Target/AMDGPU/Utils AMDGPUHWEvents.cpp

Comment
DeltaFile
+11-14llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.cpp
+11-141 files

LLVM/project 441bcd5llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUHWEvents.cpp AMDGPUHWEvents.h

[AMDGPU][InsertWaitCnts] Move HWEvent analysis code

Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.

This should be NFC.
DeltaFile
+164-0llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.cpp
+3-116llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+6-0llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.h
+173-1163 files

LLVM/project c13aa46llvm/lib/Target/AMDGPU/Utils AMDGPUHWEvents.cpp AMDGPUHWEvents.h

Comment
DeltaFile
+0-5llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.cpp
+3-1llvm/lib/Target/AMDGPU/Utils/AMDGPUHWEvents.h
+3-62 files

LLVM/project 40457f3clang/lib/StaticAnalyzer/Checkers BlockInCriticalSectionChecker.cpp, llvm/include/llvm/ADT ImmutableList.h

[llvm][ADT] Make ImmutableList conform the fwd iterator concept (#202580)

We missed post increment and a couple of typedefs. This would enable
llvm algorithms like filter_range, etc.
DeltaFile
+66-0llvm/unittests/ADT/ImmutableListTest.cpp
+0-13clang/lib/StaticAnalyzer/Checkers/BlockInCriticalSectionChecker.cpp
+13-0llvm/include/llvm/ADT/ImmutableList.h
+79-133 files

LLVM/project 41ec3fccross-project-tests/debuginfo-tests/dexter/dex/evaluation StateMatch.py

Remove debug print
DeltaFile
+0-1cross-project-tests/debuginfo-tests/dexter/dex/evaluation/StateMatch.py
+0-11 files

LLVM/project 4773921cross-project-tests/debuginfo-tests/dexter/dex/evaluation ExpectWriter.py

Remove debug print
DeltaFile
+0-3cross-project-tests/debuginfo-tests/dexter/dex/evaluation/ExpectWriter.py
+0-31 files

LLVM/project 78d4900llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU scalar-float-sop1.ll fptosi-sat-scalar.ll

AMDGPU/GlobalISel: Implement RegBankLegalize rules for SALUFloat variants of G_INTRINSIC_TRUNC, G_FFLOOR and  G_FCEIL. (#187679)

As requested on PR #179954.
DeltaFile
+134-0llvm/test/CodeGen/AMDGPU/GlobalISel/fceil.ll
+133-0llvm/test/CodeGen/AMDGPU/GlobalISel/ffloor.ll
+133-0llvm/test/CodeGen/AMDGPU/GlobalISel/intrinsic-trunc.ll
+36-96llvm/test/CodeGen/AMDGPU/scalar-float-sop1.ll
+11-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+5-5llvm/test/CodeGen/AMDGPU/fptosi-sat-scalar.ll
+452-1041 files not shown
+456-1087 files

LLVM/project a4c8e3dlldb/docs/resources overview.md lldbplatformpackets.md

[lldb][docs] Document what a Platform is (#202332)

Fixes #201875.

In #201875 a user was understandably confused what a platform even is,
and I had never had to explain it from the conceptual point of view
either.

So I wrote a long explanation
(https://github.com/llvm/llvm-project/issues/201875#issuecomment-4634087717)
specific to what they were trying to do. I don't think we need all that
in the docs and we don't have a great place for it anyway.

My alternative is:
* A high level explanation in the overview, to say what a platform does.
* A link from there to https://lldb.llvm.org/use/remote.html which has a
practical example of using one.
* A note in the platform extensions doc that our platform mode is not
related to gdb's extended remote.

    [3 lines not shown]
DeltaFile
+26-0lldb/docs/resources/overview.md
+7-0lldb/docs/resources/lldbplatformpackets.md
+33-02 files

LLVM/project 5de640eclang/test/OpenMP nvptx_SPMD_codegen.cpp nvptx_target_teams_generic_loop_codegen.cpp

[clang][OpenMP] Improve loop structure for distributed loops (pt 2)

This patch complements https://github.com/llvm/llvm-project/pull/201670
for non-reduction loops and wires the existing
`kmp_sched_distr_static_chunk_sched_static_chunkone` to be used by
CodeGen for these loops as well.
DeltaFile
+2,345-3,605clang/test/OpenMP/nvptx_SPMD_codegen.cpp
+316-1,156clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp
+301-1,141clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
+223-543clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
+120-360clang/test/OpenMP/target_teams_generic_loop_codegen_as_parallel_for.cpp
+59-179clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
+3,364-6,9847 files not shown
+3,465-7,31213 files

LLVM/project d8388a1llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 predicate-as-counter-phi.ll

[AArch64] Use PNR rather than PPR register class for aarch64svcount (#202394)

While predicates and predicate-as-counter both use the same underlying
registers, within LLVM they use different register classes (PPR vs PNR).
Mapping aarch64svcount to the PPRRegClass results in some unnecessary
cross register class copies around PHIs, which results in some
unnecessary moves.
DeltaFile
+35-0llvm/test/CodeGen/AArch64/predicate-as-counter-phi.ll
+1-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+36-12 files

LLVM/project ee28f5dlibcxx/include/__utility is_pointer_in_range.h

[libc++] Make __is_less_than_compatable a variable template (#202525)

This makes the code a bit more readable and improves compile times a
bit, since variable templates are faster to instantiate than class
templates.
DeltaFile
+5-5libcxx/include/__utility/is_pointer_in_range.h
+5-51 files

LLVM/project 7087ea3llvm/lib/Target/SPIRV SPIRVModuleAnalysis.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_relaxed_printf_string_address_space multi-function-printf.ll

[SPIR-V] Look up printf format string type in the correct function (#201523)

addPrintfRequirements() resolved the SPIR-V type of the format string
operand via getSPIRVTypeForVReg() without passing the instruction's
parent MachineFunction, so the lookup defaulted to the registry's CurMF:
whichever function happened to be processed last. Virtual register
numbers are only unique within a function, so in multi-function modules
the check could inspect an unrelated function's type, misreading its
second operand as the format string's storage class (an OpTypeInt's
width immediate, in the added test). For a format string in the constant
address space this spuriously triggered the fatal
"SPV_EXT_relaxed_printf_string_address_space is required" error, or
silently added the unnecessary extension when it was available;
conversely, the requirement could be silently omitted when the colliding
vreg had no recorded type.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
DeltaFile
+44-0llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_relaxed_printf_string_address_space/multi-function-printf.ll
+2-1llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+46-12 files

LLVM/project b000f90clang/lib/Headers riscv_packed_simd.h CMakeLists.txt, clang/test/CodeGen/RISCV rvp-intrinsics.c

[RISCV] Add riscv_packed_simd.h for P extension intrinsics (#181115)

Add `riscv_packed_simd.h` with initial RISC-V P extension intrinsics, covering:

- Packed Splat
- Packed Addition and Subtraction
- Packed Addition with Scalar
- Packed Saturating Addition and Subtraction
- Packed Shift-Add
- Packed Minimum and Maximum
- Packed Shifts
- Packed Logical Operations

The intrinsics are implemented as thin wrappers over standard C operators
and existing generic builtins (`__builtin_elementwise_add_sat` etc.), letting
the RISC-V backend lower the resulting `<N x iN>` IR to P-ext instructions.
No new clang builtins or `llvm.riscv.*` intrinsics are introduced.

Spec: https://github.com/riscv/riscv-p-spec/blob/master/P-ext-intrinsics.adoc
DeltaFile
+3,349-0clang/test/CodeGen/RISCV/rvp-intrinsics.c
+1,198-0cross-project-tests/intrinsic-header-tests/riscv_packed_simd.c
+306-0clang/lib/Headers/riscv_packed_simd.h
+1-0clang/lib/Headers/CMakeLists.txt
+4,854-04 files

LLVM/project dce55a9llvm/include/llvm/Transforms/Utils AssumeBundleBuilder.h, llvm/lib/Transforms/InstCombine InstCombineCalls.cpp

[InstCombine] Remove knowledge retention folding (#202890)

The knowledge retention API for simplifying assumes isn't that useful
anymore, since most simplifications done by it are now done
unconditionally directly in InstCombine. It's also known to miscompoile
multiple patterns.
DeltaFile
+21-43llvm/test/Transforms/InstCombine/assume.ll
+0-35llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+0-15llvm/lib/Transforms/Utils/AssumeBundleBuilder.cpp
+0-9llvm/include/llvm/Transforms/Utils/AssumeBundleBuilder.h
+21-1024 files