[AMDGPU][SIInsertWaitCnts] Remove VMemTypes (#206440)
This can be considered a RFC. I'd personally like to get rid of
VMEMTypes but I don't know if anyone feels strongly that they should be kept.
My motivation for removing VMemTypes is simple: They are just a repeat
of VMEM events, just under a different name, and messier (defined as a
basic enum but actually stored as a bitmask later). It's just confusing.
This patch eliminates the need for them by:
- Adding a new entrypoint in AMDGPUHWEvents to get the basic set of
VMEM events issued by a VMEM Instruction.
- Set BVH/SAMPLER events irrespective of whether the HW can track them.
These events exist anyway, it should be up to InsertWaitCnt to deal with
them
properly (which is easy, only `counterOutOfOrder` needed work).
- Tracking an additional set of per-VGPR "PendingEvents" which is
set using the "basic set of VMEM events" and cleared as needed.
[3 lines not shown]
[JITLink][x86-64] Fix GOTPCRELX call/jmp relaxation to use PC-relative fixup (#190179)
The GOTPCRELX optimization in `optimizeGOTAndStubAccesses()` relaxes
`call *foo at GOTPCREL(%rip)` → `addr32 call foo` and `jmp
*foo at GOTPCREL(%rip)` → `jmp foo; nop`,
but sets the edge kind to `Pointer32` (absolute). Since `e8`/`e9` are
PC-relative instructions,
`applyFixup` writes the absolute address instead of the displacement —
producing a garbage target
and SIGSEGV when JIT code is far from the callee (e.g., non-PIE
executable with an arena allocator).
**Fix:**
- Guard: `TargetInRangeForImmU32` → `DisplacementInRangeForImmS32`
(displacement must fit in signed 32-bit, not absolute address in
unsigned 32-bit)
- Edge kind: `Pointer32` → `BranchPCRel32` (so `applyFixup` writes
`Target - (Fixup + 4) + Addend`)
[14 lines not shown]
[AArch64][llvm] Add support for FEAT_HINTE for Armv9.6 onwards
Add support for `FEAT_HINTE`, as defined in the Arm ARM M.c edition[1]
This defines the Extended Hint instruction space. `FEAT_HINTE` is
optional from Armv9.0, and mandatory from Armv9.6.
Add MC coverage for assembly, disassembly, diagnostics, generic sysreg
fallback behavior, Clang driver handling, and target parser extension
mapping.
[1] https://developer.arm.com/documentation/ddi0487/latest
[Dexter] Add at_frame_idx to check values in frames above current (#203505)
This patch adds a new attribute for !and nodes, `at_frame_idx`, which
matches against frames above its parent node; for example, in the
script:
```
!where {function: foo}:
!where {function: bar}:
!and {at_frame_idx: 1}:
!value x: 0
```
The `!value x` node checks the value of 'x' in 'foo' while the debugger
is inside 'bar'. Use of this attribute comes with some restrictions: a
!where node can never be nested under a !and{at_frame_idx} node, and
neither can another !and{at_frame_idx} node.
[clang][SYCL] Diagnose reference kernel parameters (#192957)
Per SYCL 2020 spec: Reference types are not trivially copyable, so they
may not be passed as kernel parameters.
This PR adds infrastructure for kernel object visiting and implements
diagnostics for reference kernel parameters.
The infrastructure will be also used for other kernel parameter
restrictions and functional code transformations that will be done in
separate PRs.
Assisted by: claude in unit test preparation
---------
Co-authored-by: Tom Honermann <tom at honermann.net>
[ARM] Specify inlining behavior in TableGen (#205763)
Move the ARM inlining feature whitelist into the SubtargetFeature
definitions. For this purpose, add a new InlineMustMatch inlining
behavior, for features where no differences between caller/callee are
allowed.
Additionally, mark all the tuning features as InlineIgnore and fix some
cases that were incorrectly omitted from the feature whitelist.
Fixes https://github.com/llvm/llvm-project/issues/65152.
[X86][Windows] Return `fp128` on the stack (#204887)
Subsumes https://github.com/llvm/llvm-project/pull/194214
For x86-64 Windows targets, LLVM currently returns `fp128` in xmm0. This
does match `i128` (both Clang and GCC return `__int128` in xmm0) but
disagrees with GCC's behavior of returning `__float128` on the stack.
https://gcc.godbolt.org/z/xnWeGqcbW
Microsoft does not specify a `__float128` ABI so any decision is purely
an extension. The Windows x64 calling convention [1] does say that user-
defined types that do not fit in a register should be returned
indirectly, so the GCC behavior seems like a reasonable interpretation
of this rule.
Thus, change `fp128` to return on the stack for Windows targets. This is
done for both MinGW and MSVC targets; if official guidelines are ever
published, this can be revisited.
[9 lines not shown]
[BOLT] Make ICF bucket iteration order deterministic for single-threaded mode (#200706)
`CongruentBuckets` is an unordered_map, so iterating it directly
produces non-deterministic folding order across different build
environments. Sort buckets by the binary address of the representative
function before iterating to guarantee a stable debug output order.
That order isn't matter for the output binary, but it cause randomly
different debug output across different build environments, which makes
it hard to write stable tests for ICF.
Also the debug output of multi-threaded ICF is already non-deterministic
even we sort that by address, so we only try to stabilize the
single-threaded ICF output.
Update the icf-safe tests to reflect the new deterministic ordering.
Assisted-by: Sonnet 4.6
[NVPTX] Add asynchronous store intrinsics (#200768)
Adds the following intrinsics for asynchronous store operations:
- `st.async`
- `st.async.sys`
- `st.async.gpu`
- `st.async.mmio.sys`
Tests verified through `ptxas-13.3`.
[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#197346)
For MSVC-compatible targets on AArch64, mix the stack cookie with the
frame pointer (FP) to create a position-dependent guard value. This
strengthens protection against attacks where the attacker knows or can
predict the cookie value, as they would also need to know the exact
frame pointer location.
Implementation details:
- Uses SUB (FP - Cookie) instead of XOR like X86 because:
* SUB maintains the existing AArch64 instruction selection patterns
* SUB provides equivalent security properties (bijective mixing)
* The result is still unpredictable without knowing both inputs
- The same SUB operation is performed in both prologue (to store the
mixed value) and epilogue (to unmix and verify the cookie)
- Forces frame pointer usage for functions with stack guards on MSVCRT
to ensure consistent addressing with dynamic stack allocation
This matches the MSVC behavior and strengthens stack protection on
[2 lines not shown]
[SCEV] Prove implied conditions via matching SCEV differences (#201839)
Add isImpliedCondOperandsViaMatchingDiff to fold equality comparisons
when getMinusSCEV(LHS, RHS) == getMinusSCEV(FoundLHS, FoundRHS).
This handles correlated IV comparisons in loops with multiple pointer
IVs sharing the same stride.
[libc] Migrate header .def files to public_includes (#206727)
Now that public_includes is supported in header yaml files, we don't
need custom .h.def templates just to include another header.
This patch removes link.h.def, string.h.def, and ucontext.h.def and
moves their inclusions directly into
their yaml definitions:
- link.yaml: add elf.h
- string.yaml: add strings.h
- sys/ucontext.yaml: add ucontext.h
Assisted by Gemini.
[libc++] Resolve LWG4366: Heterogeneous comparison of `expected` may be ill-formed (#185342)
Resolves #171362
- Implement proposed resolution
- Refactor `operator==` code to be more in line with the standard as the
current way was making an explicit `bool()` conversion in the `x.meow()
== y.meow()` cases
- Add test cases
- Update issues paper
---------
Co-authored-by: A. Jiang <de34 at live.cn>
[orc-rt] Fix unused function warning in testcase. (#206894)
Fixes -Wunused-function warning on peekAtErrorMessage by only defining
that function when ORC_RT_ENABLE_EXCEPTIONS has been turned on.
[TargetParser][AArch64][NFC] Reference ArchInfo via index (#206699)
This removes all relocations from CpuInfos so that the 3-4 kiB structure
can be stored in .rodata. Additionally, the ArchInfos pointer array is
replaced by an ArchInfos value array and the architecture names are
replaced by constexpr references.
[orc-rt] Silence an unneeded-internal-decl warning in testcase. (#206892)
Add an ODR-use of freeVoidVoidNoexcept to silence clang's
-Wunneeded-internal-declaration warning.
[MC][TableGen] Make MCRegisterClasses relocation-free (#206753)
MCRegisterClasses currently store pointers to the register list and the
bit set. Store these three types together in one data structure and use
relative offsets to avoid these relocations and move the large
MCRegisterClasses array from .data.rel.ro into .data. This reduces the
amount of data that needs to be relocated by 86 KB.
This has two side effects: first, MCRegisterClass is not copyable and
the few uses that did copy were changed. Second, the MCRegisterClasses
array is no longer easily accessible as a global (well, it *technically*
is, but that requires the type of the entire storage struct, which I
don't want to expose). Therefore, these accesses need to go through a
function; which shouldn't be too costly and be inlined in an LTO build.
[orc-rt] CallableTraitsHelper - record operator()'s noexcept-specifier (#206891)
Adds a `bool IsNoexcept` template parameter to CallableTraitsHelper's
impl-class template argument (after the existing IsConst from
4bab60f2c63). It records the noexcept-specification on the callable's
function type.
Specializations are added for noexcept-qualified forms. Existing
specializations propagate `IsNoexcept = false`. CallableArgInfoImpl
exposes the captured bool as `static constexpr bool is_noexcept`.
Existing pass-through adapters (ErrorHandlerTraitsImplAdapter,
ErrorWrapImplAdapter, WFHandlerTraitsImplAdapter) are updated to accept
and discard the additional argument.
[X86] Insert WAIT before fnstenv/fnsave and skip meta-instructions (#204108)
fnstenv/fnsave (FSTENVm/FSAVEm) are non-waiting, so they don't
synchronize a pending FP exception; the WAIT pass shouldn't skip the
WAIT before them.
Also skip meta-instructions when finding the next op so WAIT placement
doesn't depend on -g.
Added a new X87ControlKind enum class to classify x87 control
instructions in the pass, replacing the existing ad-hoc switches.
Found via @jlebar's X86 LLVM bug-hunt / FuzzX effort:
https://github.com/SemiAnalysisAI/FuzzX/blob/master/x86/bugs/047-x87-insertwait-too-eager-skip
cc @jlebar
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[TargetParser][AArch64][NFC] Use StringTable (#206698)
Store strings in a StringTable instead of referencing them via pointers.
This permits some data structures to be stored in .rodata instead of
.data.rel.ro, as they no longer require relocations. In particular this
affects the 16 kiB AArch64::Extensions.
[orc-rt] CallableTraitsHelper - record call operator's const-qualifier (#206889)
Adds a leading `bool IsConst` template parameter to
CallableTraitsHelper's impl-class template argument to record the
const-qualifier on the callable's function type.
Existing specializations are updated to report their const qualifiers,
and a new specialization handles `RetT(ArgTs...) const`.
CallableArgInfo is updated to expose the captured bool as `static
constexpr bool is_const`.
Existing impls that do not consume the new parameter are adapted via
pass-through wrappers (ErrorHandlerTraitsImplAdapter,
ErrorWrapImplAdapter, WFHandlerTraitsImplAdapter) that discard the
leading bool.
Revert "[flang][openacc] Skip implicit global declare constructor in managed mode" (#206884)
Reverts llvm/llvm-project#206610 as this might not be the right approach