[AMDGPU] Shrink VOPCX (nosdst) instructions in SIShrinkInstructions (#202711)
Fix `SIShrinkInstructions` to shrink `V_CMPX_*_e64` instructions.
The `VOPC` destination handling block treated `MI.getOperand(0)` as the
destination unconditionally, but `VOPCX` variants have no `sdst`, their
operand 0 is `src0`. The pass set a bogus VCC allocation hint on the
source SGPR and skipped the shrink. The existing comment said these
should be excluded, but the code never actually did so.
Reapply "[Dexter] Add label nodes for line references" (#203938)
This reverts commit
https://github.com/llvm/llvm-project/commit/2a789821b0d723bc92f61563e74e67d69d660927.
The original commit previously caused pre-merge check failures for Linux
AArch64 cross-project-tests, due to the stepping behaviour being
slightly different to x86_64. The tests have been adjusted to be less
brittle to exact stepping behaviour, but this reapply also disables the
tests for ARM targets as there is not currently the infrastructure to
reliably test them, meaning we may end up with latent failures.
[SPIR-V] Order alias-decl instructions so definitions precede uses (#203699)
The SPV_INTEL_memory_access_aliasing decl instructions are built at the
insertion point of the memory operation being selected. Because
selection is bottom-up, an alias domain shared between scopes that are
built at different memory operations could be emitted after a scope that
references it.
Sort the collected aliasing instructions by dependency tier.
[SPIR-V] Don't lower metadata-argument intrinsics to functions (#203654)
llvm.experimental.noalias.scope.decl takes a metadata argument. When the
SPIR-V backend lowered unknown intrinsics to functions (AMD vendor or
--spv-allow-unknown-intrinsics), it emitted
@spirv.llvm_experimental_noalias_scope_decl(metadata ...), which fails
the verifier: "Function has metadata parameter but isn't an intrinsic".
Therefore drop llvm.experimental.noalias.scope.decl, as it has no SPIR-V
representation.
[SPIR-V] Fix verifier crash on aggregate extract into a mutated callsite (#203729)
SPIRVPrepareFunctions rewrites indirect/inline-asm callsite signatures
so aggregate params become i32 value-ids, but leaves the operands for
SPIRVEmitIntrinsics to tokenize. An aggregate-returning spv_extractv
result passed to such a call was never tokenized, so it no longer
matched the mutated callee signature, tripping the IR verifier ("Call
parameter type does not match function signature!").
Mutate the spv_extractv result to i32 when it feeds a callsite param
that was rewritten to a value-id. Real SPIR-V type is recovered from the
value attributes later during selection.
Assisted by: Claude Code
[AMDGPU] Add synthetic apertures and use them for barriers
Define what a synthetic aperture is, and adjust the barrier AS
to use this new system. This makes the barrier AS even safer to
use as now we can use all 32 bits of it without ever risking
hitting a valid address of any kind (LDS or outside LDS).
[NFC][LLVM][Tests] Replace instances of @llvm.aarch64.sve.ptrue.nxv16i1(i32 31) with splat (i1 true). (#204113)
I have kept instances where the ptrue seems more relevant. For example,
when a test varies the predicate pattern, I opted to maintain ptrue(31)
for test consistency.
Use a CAS loop for pointer types in __cxx_atomic_fetch_{max,min} on c11.h
Clang's __c11_atomic_fetch_max/min builtins reject pointer arguments
("address argument to atomic operation must be a pointer to atomic integer
or supported floating point type"). The four __cxx_atomic_fetch_{max,min}
overloads in support/c11.h were unconditionally calling those builtins, so
std::atomic<T*>::fetch_max/min failed to compile on every Clang-based
config. support/gcc.h already had this dispatch via #if __has_builtin(...)
falling back to a CAS loop, which is why generic-gcc was passing.
Dispatch on is_pointer<_Tp>::value: pointer types use a CAS loop matching
gcc.h's body, others keep calling the builtin.
Assisted by Claude
[ISel] Introduce `llvm.pext` and `llvm.pdep` intrinsics (#200570)
Closes #172857
These are portable forms of the x86_64 pext/pdep or AArch64 bext/bdep instructions.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[AArch64] Define GCS operations as SYS and SYSL aliases
Move the remaining `GCS` instructions from dedicated opcodes to `SYSxt/SYSLxt`
aliases, keeping a tied `SYSL` pseudo for codegen where `GCS` preserves the`
input register when disabled at runtime.
Update `GCS` intrinsic selection, scheduling, disassembly aliases, and MC
coverage for the generic `SYS/SYSL` encodings.
[AArch64][llvm] Define APAS, BRB and TRCIT as SYS aliases (#203563)
`APAS`, `BRB IALL/INJ` and `TRCIT` use `SYS` encodings, so define them
as aliases of `SYSxt` instead of separate instructions.
Check that the preferred architectural aliases are printed when their
features are enabled and that disassembly falls back to the generic `SYS`
spelling when not enabled.
[lldb][test] Introduce build_and_run test utility (#194386)
We currently have several hundred tests require a running process in a
given state, and therefore perform the same three tasks:
* compile a test executable
* set a breakpoint by finding a source regex
* then launch the test process to hit that breakpoint.
A large chunk of these tests do this exact same setup with various
versions of copied boilerplate code. The different versions we have all
have different conventions of naming the breakpoint comment, the main
file (and whether it should be resolved), and different generated error
messages if things go wrong.
We already have a standardized and much shorter way of doing this in
LLDB (see below), but this still encourages test writers to specify
non-standard file names and non-standard breakpoint comment names.
[15 lines not shown]
[lldb][test] Faster shut down for pexpect tests (#201171)
Our pexpect tests spend most of their time in the shutdown logic
waiting for the test child to shut down. For example, our editline
tests spend about 95% of their 40s runtime just waiting for the
pexpect child to terminate.
One of the reasons is that the ptyprocess terminate approach
uses a timeout to give the child time to shut down and be cleaned
up by the kernel. While this timeout makes sense, our timeout is
extremely long (6s) since 56fb7456950d2564d16500e40c5719c954a6987a .
Because the default ptyprocess implementation is designed for very
short timeouts (0.1s), it just sleeps and then checks the process
status. For our long timeout, the child most likely already terminated
way before the timeout on a fast system. However, because we have
some very slow builders, we cannot reduce this timeout without
making tests flaky again.
[7 lines not shown]
AMDGPU: Reland: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
Original patch had a bug where it did not check if physical src
registers match register class of appropriate operand in fullVOPD
instructions, check is now done via isValidVOPDSrc.
[lldb] Avoid calling dyld's versions of libc functions (#201829)
dyld ships with its own version of various libc functions that we are
not supposed to call. This patch prevents the expression evaluator from
calling them by respecting the existing list of forbidden modules.
[flang][mem2reg] promote memory slots through declares (#196975)
Leverage the new mem2reg APIs for views to remove the
"same block" limitation over fir.declare mem2reg, and to allow mem2reg
over fir.convert so that mixed dialect mem2reg with fir + memref is
possible.
Note that fir.declare_value for memory used with different value types
will be dropped (e.g. EQUIVALENCE). A later patch will deal with
improving fir.declare_value to carry the variable type interpedently of
the value (like in LLVM), but there are anyway a bit more work to enable
mem2reg with equivalence given their storage is an array of bytes.
Assisted by: Claude
[MIPS] soft-promote `f16` also when using `+msa` (#203065)
Fixes https://github.com/llvm/llvm-project/issues/202808
Make use of the default soft-promote mechanism for f16, rather than an
ad-hoc approach making f16 storage-only.
In theory you could leave it at that, but I added custom implementations
to make use of the instructions for `FP16_TO_FP` and `FP_TO_FP16`, and
manually apply the "fptoui to fptosi trick" which generates shorter
code.
I don't really have a good way of testing this. The assembly changes
look reasonable but it's easy to miss something subtle of course. I've
tried to break the change up into smaller commits but it's still kind of
a lot.
[SelectionDAG] Fold subvector inserts into concat operands (#200937)
Push insert_subvector into the containing CONCAT_VECTORS operand when
the insertion is wholly contained there.
AI note: an LLM generated the code and the test, I've read them
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.