LLVM/project 23cf668llvm/test/Analysis/CostModel/AMDGPU exp2.ll exp10.ll, llvm/test/CodeGen/AArch64 clmul-fixed.ll

Merge branch 'main' into users/arsenm/revert-185420
DeltaFile
+853-1,663llvm/test/CodeGen/AArch64/clmul-fixed.ll
+192-192llvm/test/Analysis/CostModel/AMDGPU/exp2.ll
+192-192llvm/test/Analysis/CostModel/AMDGPU/exp10.ll
+192-192llvm/test/Analysis/CostModel/AMDGPU/exp.ll
+243-48llvm/test/CodeGen/WebAssembly/simd-reductions.ll
+68-87llvm/test/CodeGen/X86/clmul-vector.ll
+1,740-2,37455 files not shown
+2,716-2,66361 files

LLVM/project c6bde83clang/lib/AST TypePrinter.cpp, clang/test/AST/HLSL ByteAddressBuffers-AST.hlsl Texture2D-vector-AST.hlsl

[HLSL] Fix interleaved vector and matrix return types in AST dump (#184888)

HLSL vector and matrix types were previously printed with their closing
syntax (', N>') in 'printAfter', causing them to interleave with
function
parameters when used as return types (e.g., 'vector<float (args), 4>').

This change moves the HLSL vector and matrix closing syntax into
'printBefore' when 'UseHLSLTypes' is enabled, ensuring the type is
printed completely before the parameter list.

Note that address space qualifiers are now printed after the type
(e.g., 'vector<float, 4>hlsl_device'). This is because
'canPrefixQualifiers' in 'TypePrinter.cpp' returns false for these
types.
We cannot easily change this to check 'UseHLSLTypes' because
'canPrefixQualifiers' is a static method and does not have access to the
PrintingPolicy at that point.


    [4 lines not shown]
DeltaFile
+56-40clang/lib/AST/TypePrinter.cpp
+30-30clang/test/AST/HLSL/ByteAddressBuffers-AST.hlsl
+27-27clang/test/AST/HLSL/Texture2D-vector-AST.hlsl
+16-16clang/test/AST/HLSL/Texture2D-scalar-AST.hlsl
+5-5clang/test/AST/HLSL/pch_with_matrix_element_accessor.hlsl
+3-3clang/test/SemaHLSL/Types/BuiltinMatrix/MatrixFloatPrecisionWarnings.hlsl
+137-1215 files not shown
+143-12711 files

LLVM/project 51aad69llvm/lib/CodeGen/AsmPrinter DwarfCompileUnit.cpp DwarfDebug.h, llvm/lib/Target/NVPTX NVPTXDwarfDebug.cpp NVPTXDwarfDebug.h

[NFC] Migrate NVPTX specific debug info code to separate class

This refactors the dwarf emission code to pull out the rest of the NVPTX specific code into it's own subclass for debug info handling and architecture specific differences.

Tested with ninja check-all on OSX.
DeltaFile
+97-2llvm/lib/Target/NVPTX/NVPTXDwarfDebug.cpp
+14-68llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+39-1llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
+7-22llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+19-8llvm/lib/Target/NVPTX/NVPTXDwarfDebug.h
+1-1llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
+177-1026 files

LLVM/project 4095ac9clang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenBuiltin.cpp CIRGenCall.cpp

[CIR][CIRGen] Upstream support for `__builtin_bcopy` (#185038)

This adds CIR support for the bcopy builtin.
DeltaFile
+116-0clang/test/CIR/CodeGenBuiltins/builtin-bcopy.cpp
+29-1clang/include/clang/CIR/Dialect/IR/CIROps.td
+12-2clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+9-0clang/lib/CIR/CodeGen/CIRGenCall.cpp
+9-0clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+7-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+182-31 files not shown
+187-37 files

LLVM/project 27e7502llvm/test/CodeGen/AArch64 fp-maximumnum-minimumnum.ll, llvm/test/CodeGen/X86 wide-scalar-shift-by-byte-multiple-legalization.ll andnot-sink-not.ll

Merge branch 'main' into users/s-perron/fix-vector-type-printer
DeltaFile
+1,561-2,812llvm/test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll
+2,071-1,930llvm/test/CodeGen/AArch64/fp-maximumnum-minimumnum.ll
+3,114-0llvm/test/CodeGen/X86/andnot-sink-not.ll
+969-2,001llvm/test/CodeGen/X86/bit-manip-i512.ll
+538-1,357llvm/test/CodeGen/X86/shift-i512.ll
+1,273-36llvm/test/MC/AMDGPU/gfx1170_asm_vop3_dpp16.s
+9,526-8,1361,961 files not shown
+75,334-34,0311,967 files

LLVM/project 1b396eeutils/bazel/llvm-project-overlay/libc BUILD.bazel

[Bazel] Fix 6a6564cd1476270e9342b6be0792109516ddd99c

Introduced a new header/dependencies on said header.
DeltaFile
+6-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+6-01 files

LLVM/project 982beb1clang/lib/Basic/Targets SPIR.h, clang/test/Sema scoped-atomic-ops.c

[SPIR] Do not warn on 64-bit atomics (#185502)

Summary:
SPIR-V's Int64Atomics capability is not dependent on its addressing mode
as far as I am aware. These 32-bit SPIR targets already claim to support
the cl_khr_int64 atomics and we already emit 64-bit atomics in the
backend. Additionally, this is already accepted as a hack due to the
fact that the host will increase it in offloading usage. I do not see a
reason to keep these at 32, which causes numerous warnings inside of the
`libclc` build.
DeltaFile
+5-0clang/test/Sema/scoped-atomic-ops.c
+2-2clang/lib/Basic/Targets/SPIR.h
+7-22 files

LLVM/project 0b5c16blibclc/opencl/lib/amdgcn/printf __printf_alloc.cl, libclc/opencl/lib/generic/atomic atomic_fetch_add.cl atomic_fetch_sub.cl

[libclc] Replace last of `opencl` atomics with `__scoped_` versions (#185515)

Summary:
These were the only uses of the old atomics. The old definition guards
stay as those prevent us from compiling the unsupported uintptr_t atomic
type on nvptx which does not define it. Could probably be improved
later.
DeltaFile
+6-6libclc/opencl/lib/generic/atomic/atomic_fetch_add.cl
+6-6libclc/opencl/lib/generic/atomic/atomic_fetch_sub.cl
+5-5libclc/opencl/lib/amdgcn/printf/__printf_alloc.cl
+17-173 files

LLVM/project 367569ellvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/AArch64 clmul-fixed.ll

[SelectionDAG] Use ExpandIntRes_CLMUL  to expand vector CLMUL via  narrower legal types (#184468)

Reuse the ExpandIntRes_CLMUL identity to expand vector
CLMUL/CLMULR/CLMULH on wider element types (vXi16, vXi32, vXi64) by
decomposing into half-element-width operations that eventually reach a
legal CLMUL type.

Three generic strategies in expandCLMUL:
1. Halve: halve element width (e.g. v8i16 -> v8i8 on AArch64)
2. promote to double : zext to wider type if CLMUL is legal there (e.g.
x86)
3. Count widen: pad with undef to double element count (e.g. v4i16 ->
v8i16)

A helper canNarrowCLMULToLegal() guides strategy selection and prevents
circular expansion in the CLMULH bitreverse path.

Also add Custom BITREVERSE lowering for v4i16/v8i16 on AArch64 using
REV16+RBIT, which the CLMULH expansion relies on.

Fixes #183768
DeltaFile
+853-1,663llvm/test/CodeGen/AArch64/clmul-fixed.ll
+68-87llvm/test/CodeGen/X86/clmul-vector.ll
+143-10llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+1,064-1,7603 files

LLVM/project 1324ea1llvm/lib/Target/WebAssembly WebAssemblyInstrSIMD.td, llvm/test/CodeGen/WebAssembly simd-reductions.ll simd-memcmp.ll

[WebAssembly] Fold any/alltrue SIMD boolean reductions with eqz (#184704)

Existing ISel patterns match setne/seteq following SIMD boolean reductions
any_true and all_true, and drop the ones that are redundant (because the
reductions always return 1 or 0). This adds patterns to also produce eqz
instructions instead of a comparison with a const.
DeltaFile
+243-48llvm/test/CodeGen/WebAssembly/simd-reductions.ll
+1-2llvm/test/CodeGen/WebAssembly/simd-memcmp.ll
+2-0llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
+246-503 files

LLVM/project 331a91cllvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp

[NFC][AMDGPU] Add debug print to `AMDGPULowerVGPREncoding.cpp` (#185331)
DeltaFile
+91-3llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+91-31 files

LLVM/project a8b726aflang-rt/lib/runtime execute.cpp, flang-rt/unittests/Runtime CommandTest.cpp

[flang-rt] Need to pad the output of execute_command_line(..., CMDMSG) (#185509)

Previously the error message was copied, but not padded for cases where
the message was shorter than the passed CMDMSG string. Add the padding
and also change the test case to test padding on all platforms.
DeltaFile
+12-5flang-rt/unittests/Runtime/CommandTest.cpp
+4-8flang-rt/lib/runtime/execute.cpp
+16-132 files

LLVM/project f2dc489llvm/test/Analysis/CostModel/AMDGPU exp.ll exp10.ll

[AMDGPU] Replace undef with poison in exp/exp2/exp10 cost tests NFC (#185527)
DeltaFile
+192-192llvm/test/Analysis/CostModel/AMDGPU/exp.ll
+192-192llvm/test/Analysis/CostModel/AMDGPU/exp10.ll
+192-192llvm/test/Analysis/CostModel/AMDGPU/exp2.ll
+576-5763 files

LLVM/project 9d35ee2llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp

change all
DeltaFile
+9-9llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+9-91 files

LLVM/project 2ec0ff7llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.h AMDGPUCoExecSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-scheduler.ll

[AMDGPU] Add stalls for DS FIFO buffer

Change-Id: I73e56da97a931349e0655e4e20b24aeb97920647
DeltaFile
+56-51llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+41-6llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+40-2llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+137-593 files

LLVM/project a4ab49allvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

Fix mir test

Change-Id: I1b3dba10ea74c98454c433ecd52b165836929075
DeltaFile
+2-1llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+2-11 files

LLVM/project 1cfd5a4llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUCoExecSchedStrategy.h, llvm/test/CodeGen/AMDGPU coexec-scheduler.ll coexec-sched-effective-stall.mir

[AMDGPU] Add HWUI pressure heuristics to coexec strategy

Change-Id: I322cc670c8d923a6df23588d8a14cdaec1f49da9
DeltaFile
+601-0llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+413-22llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+284-2llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+4-4llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+1,302-284 files

LLVM/project a8caa42llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUCoExecSchedStrategy.h

Use AMDGPU namespace + const ref

Change-Id: Ie4ca27528c92dbd0f3cf6293d9bc25d13b7d31fc
DeltaFile
+17-16llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+12-8llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+29-242 files

LLVM/project e18117ellvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp

Change old code

Change-Id: I26cff6c0c5743684778f022b264c9930eeff24ce
DeltaFile
+4-2llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+4-21 files

LLVM/project bc2d276llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp

Formating.
DeltaFile
+3-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+3-31 files

LLVM/project 19977f9llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp GCNSubtarget.cpp, llvm/test/CodeGen/AMDGPU amdgpu-workload-type-scheduler-debug.mir

Remove module "workload-type" metadata.
DeltaFile
+0-114llvm/test/CodeGen/AMDGPU/amdgpu-workload-type-scheduler-debug.mir
+10-45llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+1-16llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+11-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+4-1llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+4-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
+30-1796 files

LLVM/project 28e38e4llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp AMDGPUCoExecSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+35-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+26-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+7-2llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+2-2llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+80-71 files not shown
+82-77 files

LLVM/project 04c6e4ellvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir amdgpu-workload-type-scheduler-debug.mir

[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling

This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.

It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.

It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
DeltaFile
+275-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+124-0llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+114-0llvm/test/CodeGen/AMDGPU/amdgpu-workload-type-scheduler-debug.mir
+64-5llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+43-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+22-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+642-53 files not shown
+663-149 files

LLVM/project 8f07552llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

Update coexec-sched-effective-stall.mir
DeltaFile
+0-2llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+0-21 files

LLVM/project a9e457aoffload/plugins-nextgen/amdgpu/src rtl.cpp, offload/plugins-nextgen/common/include PluginInterface.h

[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496)

Summary:
This was a regression from the original LLVM-gpu-loader. We used to
handle `-mwavefrontsize64` correctly in the loader by over-allocating
memory and just leaving the upper 32-bits masked off. In order to handle
this in offload we need to scan loaded kernels to see how much memory we
need to allocate. This should be safe, the protocol is designed to
handle an arbitrary size and worst-case this just wastes space.
DeltaFile
+21-0offload/plugins-nextgen/amdgpu/src/rtl.cpp
+3-3offload/plugins-nextgen/common/src/RPC.cpp
+3-0offload/plugins-nextgen/common/include/PluginInterface.h
+27-33 files

LLVM/project 6a6564clibc/hdr/types Elf64_auxv_t.h Elf32_auxv_t.h, libc/include elf.yaml

[libc] Add more macro/type declarations to Elf headers. (#185348)

* Add several `AT_` macro values from `<sys/auxv.h>`. In particular,
this allows to make internal Linux auxv header parsing more hermetic by
removing one of Linux header includes.
* Add constants between `DT_ADDRNGLO` and `DT_ADDRNGHI`, in particular
`DT_GNU_HASH`, which is de-facto standard on many platforms.
* Add `Elf32_auxv_t` and `Elf64_auxv_t` types which define the auxv
entries and can be used by VDSO parsing code. Note that this PR doesn't
yet update libc's own Linux auxv header support (in
`src/__support/OSUtil/linux/auxv.h`).

This fixes some of the missing definitions when building code working
with Elf files, such as Abseil's debugging support in

https://github.com/abseil/abseil-cpp/tree/master/absl/debugging/internal.
DeltaFile
+38-0libc/include/elf.yaml
+22-0libc/hdr/types/Elf64_auxv_t.h
+22-0libc/hdr/types/Elf32_auxv_t.h
+21-0libc/include/llvm-libc-types/Elf32_auxv_t.h
+21-0libc/include/llvm-libc-types/Elf64_auxv_t.h
+16-0libc/hdr/types/CMakeLists.txt
+140-07 files not shown
+154-113 files

LLVM/project 0559fe6clang-tools-extra/clang-doc CMakeLists.txt, clang-tools-extra/clang-doc/benchmarks CMakeLists.txt

[clang-doc] Cleanup CMake files and ensure benchmarks build (#185469)

There's some poor formatting, and ClangDocBenchmark references several
targets that are required, but only because they're required for clang-doc
itself. We can just get those requirements from the clangDoc target.

Additionally, we can make sure the benchmark builds as part of testing
when LLVM_INCLUDE_BENCHMARKS is set.
DeltaFile
+0-5clang-tools-extra/clang-doc/benchmarks/CMakeLists.txt
+4-0clang-tools-extra/test/clang-doc/CMakeLists.txt
+1-1clang-tools-extra/clang-doc/CMakeLists.txt
+5-63 files

LLVM/project d5685acllvm/lib/Target/AMDGPU AMDGPULowerKernelAttributes.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h AMDGPUBaseInfo.cpp

Revert "AMDGPU: Annotate group size ABI loads with range metadata (#185420)"

This reverts commit 76daf31b4000623d5c9548348a859ea3ed8712e1.

Bot failure.
DeltaFile
+15-122llvm/test/CodeGen/AMDGPU/implicit-arg-v5-opt.ll
+19-48llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+7-8llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll
+7-8llvm/test/CodeGen/AMDGPU/implicit-arg-block-count.ll
+2-5llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+5-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+55-1912 files not shown
+57-1938 files

LLVM/project 90978e4llvm/lib/Target/AArch64 AArch64Arm64ECCallLowering.cpp, llvm/test/CodeGen/AArch64 arm64ec-entry-thunks.ll

[arm64ec] Fix missing sret return in Arm64EC entry thunks for large struct returns (#185452)

When an Arm64EC function returns a struct by value that is too large for
x64's `RAX` (>8 bytes), the entry thunk synthesizes a hidden sret
pointer parameter for the x64 side. However, this
parameter was never marked with the sret attribute, so ISel did not copy
its value into `x8` (the Arm64EC mapping of `RAX`) on return. This
caused the x64 caller to see a garbage pointer in `RAX` instead of the
return buffer address.

The change adds the sret attribute to the thunk's synthesized pointer
parameter, so that `LowerFormalArguments` saves it and `LowerReturn`
restores it to `x8` before the tail call to `__os_arm64x_dispatch_ret`.

Fixes #185390
DeltaFile
+5-0llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+2-0llvm/test/CodeGen/AArch64/arm64ec-entry-thunks.ll
+7-02 files

LLVM/project 13f5238llvm/test/CodeGen/AArch64 movi64_sve.ll

[AArch64][GlobalISel] Add test coverage to movi64_sve.ll. NFC
DeltaFile
+404-183llvm/test/CodeGen/AArch64/movi64_sve.ll
+404-1831 files