[SPIR] Do not warn on 64-bit atomics (#185502)
Summary:
SPIR-V's Int64Atomics capability is not dependent on its addressing mode
as far as I am aware. These 32-bit SPIR targets already claim to support
the cl_khr_int64 atomics and we already emit 64-bit atomics in the
backend. Additionally, this is already accepted as a hack due to the
fact that the host will increase it in offloading usage. I do not see a
reason to keep these at 32, which causes numerous warnings inside of the
`libclc` build.
[libclc] Replace last of `opencl` atomics with `__scoped_` versions (#185515)
Summary:
These were the only uses of the old atomics. The old definition guards
stay as those prevent us from compiling the unsupported uintptr_t atomic
type on nvptx which does not define it. Could probably be improved
later.
[SelectionDAG] Use ExpandIntRes_CLMUL to expand vector CLMUL via narrower legal types (#184468)
Reuse the ExpandIntRes_CLMUL identity to expand vector
CLMUL/CLMULR/CLMULH on wider element types (vXi16, vXi32, vXi64) by
decomposing into half-element-width operations that eventually reach a
legal CLMUL type.
Three generic strategies in expandCLMUL:
1. Halve: halve element width (e.g. v8i16 -> v8i8 on AArch64)
2. promote to double : zext to wider type if CLMUL is legal there (e.g.
x86)
3. Count widen: pad with undef to double element count (e.g. v4i16 ->
v8i16)
A helper canNarrowCLMULToLegal() guides strategy selection and prevents
circular expansion in the CLMULH bitreverse path.
Also add Custom BITREVERSE lowering for v4i16/v8i16 on AArch64 using
REV16+RBIT, which the CLMULH expansion relies on.
Fixes #183768
[WebAssembly] Fold any/alltrue SIMD boolean reductions with eqz (#184704)
Existing ISel patterns match setne/seteq following SIMD boolean reductions
any_true and all_true, and drop the ones that are redundant (because the
reductions always return 1 or 0). This adds patterns to also produce eqz
instructions instead of a comparison with a const.
[flang-rt] Need to pad the output of execute_command_line(..., CMDMSG) (#185509)
Previously the error message was copied, but not padded for cases where
the message was shorter than the passed CMDMSG string. Add the padding
and also change the test case to test padding on all platforms.
set MODCLANG_COMPILER_LINKS = No, we only want to install ports clang
(for clang-scan-deps which is needed for C++20 modules but not included
in base), we don't want to use it to compile.
this still adds as an unwanted BDEP on ports-gcc archs, but at least it
doesn't then get in the way of build. (we do want to use the module rather
than add a BDEP on a hardcoded lang/llvm/19).
found by jca@
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.