[HLSL] Fix interleaved vector and matrix return types in AST dump (#184888)
HLSL vector and matrix types were previously printed with their closing
syntax (', N>') in 'printAfter', causing them to interleave with
function
parameters when used as return types (e.g., 'vector<float (args), 4>').
This change moves the HLSL vector and matrix closing syntax into
'printBefore' when 'UseHLSLTypes' is enabled, ensuring the type is
printed completely before the parameter list.
Note that address space qualifiers are now printed after the type
(e.g., 'vector<float, 4>hlsl_device'). This is because
'canPrefixQualifiers' in 'TypePrinter.cpp' returns false for these
types.
We cannot easily change this to check 'UseHLSLTypes' because
'canPrefixQualifiers' is a static method and does not have access to the
PrintingPolicy at that point.
[4 lines not shown]
[NFC] Migrate NVPTX specific debug info code to separate class
This refactors the dwarf emission code to pull out the rest of the NVPTX specific code into it's own subclass for debug info handling and architecture specific differences.
Tested with ninja check-all on OSX.
[SPIR] Do not warn on 64-bit atomics (#185502)
Summary:
SPIR-V's Int64Atomics capability is not dependent on its addressing mode
as far as I am aware. These 32-bit SPIR targets already claim to support
the cl_khr_int64 atomics and we already emit 64-bit atomics in the
backend. Additionally, this is already accepted as a hack due to the
fact that the host will increase it in offloading usage. I do not see a
reason to keep these at 32, which causes numerous warnings inside of the
`libclc` build.
[libclc] Replace last of `opencl` atomics with `__scoped_` versions (#185515)
Summary:
These were the only uses of the old atomics. The old definition guards
stay as those prevent us from compiling the unsupported uintptr_t atomic
type on nvptx which does not define it. Could probably be improved
later.
[SelectionDAG] Use ExpandIntRes_CLMUL to expand vector CLMUL via narrower legal types (#184468)
Reuse the ExpandIntRes_CLMUL identity to expand vector
CLMUL/CLMULR/CLMULH on wider element types (vXi16, vXi32, vXi64) by
decomposing into half-element-width operations that eventually reach a
legal CLMUL type.
Three generic strategies in expandCLMUL:
1. Halve: halve element width (e.g. v8i16 -> v8i8 on AArch64)
2. promote to double : zext to wider type if CLMUL is legal there (e.g.
x86)
3. Count widen: pad with undef to double element count (e.g. v4i16 ->
v8i16)
A helper canNarrowCLMULToLegal() guides strategy selection and prevents
circular expansion in the CLMULH bitreverse path.
Also add Custom BITREVERSE lowering for v4i16/v8i16 on AArch64 using
REV16+RBIT, which the CLMULH expansion relies on.
Fixes #183768
[WebAssembly] Fold any/alltrue SIMD boolean reductions with eqz (#184704)
Existing ISel patterns match setne/seteq following SIMD boolean reductions
any_true and all_true, and drop the ones that are redundant (because the
reductions always return 1 or 0). This adds patterns to also produce eqz
instructions instead of a comparison with a const.
[flang-rt] Need to pad the output of execute_command_line(..., CMDMSG) (#185509)
Previously the error message was copied, but not padded for cases where
the message was shorter than the passed CMDMSG string. Add the padding
and also change the test case to test padding on all platforms.
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling
This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.
It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.
It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496)
Summary:
This was a regression from the original LLVM-gpu-loader. We used to
handle `-mwavefrontsize64` correctly in the loader by over-allocating
memory and just leaving the upper 32-bits masked off. In order to handle
this in offload we need to scan loaded kernels to see how much memory we
need to allocate. This should be safe, the protocol is designed to
handle an arbitrary size and worst-case this just wastes space.
[libc] Add more macro/type declarations to Elf headers. (#185348)
* Add several `AT_` macro values from `<sys/auxv.h>`. In particular,
this allows to make internal Linux auxv header parsing more hermetic by
removing one of Linux header includes.
* Add constants between `DT_ADDRNGLO` and `DT_ADDRNGHI`, in particular
`DT_GNU_HASH`, which is de-facto standard on many platforms.
* Add `Elf32_auxv_t` and `Elf64_auxv_t` types which define the auxv
entries and can be used by VDSO parsing code. Note that this PR doesn't
yet update libc's own Linux auxv header support (in
`src/__support/OSUtil/linux/auxv.h`).
This fixes some of the missing definitions when building code working
with Elf files, such as Abseil's debugging support in
https://github.com/abseil/abseil-cpp/tree/master/absl/debugging/internal.
[clang-doc] Cleanup CMake files and ensure benchmarks build (#185469)
There's some poor formatting, and ClangDocBenchmark references several
targets that are required, but only because they're required for clang-doc
itself. We can just get those requirements from the clangDoc target.
Additionally, we can make sure the benchmark builds as part of testing
when LLVM_INCLUDE_BENCHMARKS is set.
[arm64ec] Fix missing sret return in Arm64EC entry thunks for large struct returns (#185452)
When an Arm64EC function returns a struct by value that is too large for
x64's `RAX` (>8 bytes), the entry thunk synthesizes a hidden sret
pointer parameter for the x64 side. However, this
parameter was never marked with the sret attribute, so ISel did not copy
its value into `x8` (the Arm64EC mapping of `RAX`) on return. This
caused the x64 caller to see a garbage pointer in `RAX` instead of the
return buffer address.
The change adds the sret attribute to the thunk's synthesized pointer
parameter, so that `LowerFormalArguments` saves it and `LowerReturn`
restores it to `x8` before the tail call to `__os_arm64x_dispatch_ret`.
Fixes #185390