[RFC][AMDGPU][lld] Add object linking support
Add AMDGPU ELF object-linking support in lld, including resource propagation,
LDS layout, indirect-call handling, named-barrier updates, target compatibility
checks, and kernel descriptor/metadata patching.
This is a large PR because the linker needs to understand and validate several
AMDGPU object-linking concepts end to end. I tried to keep the changes scoped to
the necessary linker support and related metadata plumbing, but I'm open to
suggestions on how to split or structure the review to make it easier.
[RFC][AMDGPU][lld] Add object linking support
Add AMDGPU ELF object-linking support in lld, including resource propagation,
LDS layout, indirect-call handling, named-barrier updates, target compatibility
checks, and kernel descriptor/metadata patching.
This is a large PR because the linker needs to understand and validate several
AMDGPU object-linking concepts end to end. I tried to keep the changes scoped to
the necessary linker support and related metadata plumbing, but I'm open to
suggestions on how to split or structure the review to make it easier.
[RISCV] Constant fold bitcast of constant 0 in combineVectorSizedSetCCEquality. (#207112)
There seem to be some combinations of vector type and scalar type where
a bitcast of constant 0 doesn't get folded or type legalized to a
build_vector of 0 with the vector type we want. I think it's when the
integer type is 2*xlen and <2 x iXLen> is a legal type, but I'm not
sure.
I don't have any other tests to know if adding/improving a DAG combine
is worthwhile so I just did this quick fix at the source.
[RISCV] Rework vmsge(u).vx pseudos to work better with near-miss assembler support (#207097)
Previously we had 3 pseudos:
vr destination with no mask
vrnov0 destination with mask
vr destination with mask and temporary dest
This was intended to prevent a v0 destination with mask and no
temporary. The vrnov0 case confused the near miss code and caused
multiple errors.
This patch reduces to 2 pseudos:
vr destination with optional mask
vr destination with mask and pseudo
The v0 destination with mask error is moved to validateInstruction which
allows us to give a better error.
[PHIElimination] Preserve SlotIndexes even without LiveIntervals
There are some pipelines where we request SlotIndexes, but not
LiveIntervals before running PHIElimination. This happens in the MSP430
pipeline where StackColoring requests SlotIndexes, but nothing requests
LiveIntervals. Before PHIElimination would only update SlotIndexes
through LiveIntervals, which would mean we would fail to preserve
SlotIndexes if it was available but LiveIntervals was not. This would
cause crashes when later passes tried to use SlotIndexes under the
NewPM.
Reviewers: RKSimon, aeubanks, arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/207074
[X86] Verify inline-asm register operands against the subtarget
Inline asm can name physical registers that require a subtarget feature
the selected subtarget lacks: zmm and mask (k) registers need AVX-512,
ymm registers need AVX. The subtarget is derived from the function's
target-cpu/target-features, so no MachineFunction is required.
[X86] Add subtarget-dependent checks
Checks that depend on a function's target-cpu/target-features (built into
an MCSubtargetInfo), which the triple-only IR verifier cannot express:
- x86 intrinsics (llvm.x86.avx/avx2/avx512.*) require AVX/AVX2/AVX-512.
- 128/256-bit AVX-512 intrinsics additionally require AVX512VL.
- The x86_amx type requires AMX-TILE.
[X86] Add target verifier
Add an X86 TargetVerify and register it by triple so the
TargetVerifierPass dispatches to it for X86 modules. It performs no
checks yet; the subtarget-dependent checks are added in a follow-up.
[Target] Add target-independent TargetVerifier dispatcher
Introduce a target-dependent IR verification framework that can be run
from target-independent locations.
TargetVerify is a base class each backend subclasses to check a function
for constructs that are invalid for a particular target. Backends
register a factory keyed by Triple::ArchType via registerTargetVerify(),
typically from their LLVMInitialize<Target>Target().
TargetVerifierPass (registered as "target-verifier") is the dispatcher:
it reads the module triple and, if a verifier is registered for that
architecture, runs the generic IR verifier followed by the target's
TargetVerify. It is a no-op for targets that have not registered a
verifier, so it is safe to schedule from generic, target-independent
pipelines (e.g. `opt -passes=target-verifier`).
createTargetVerifierPass() is a legacy-PM wrapper that TargetPassConfig
adds to the codegen pipeline under -verify-target, so the target verifier
can also run from llc (e.g. `llc -verify-target`).
[SimplifyCFG] Don't hoist a musttail call separately from the terminator (#207094)
SimplifyCFG can accidentally hoist `musttail` away from the `ret` if
`hoistCommonCodeFromSuccessor` skipped a differing instruction, causing
a misverify. So we need to guard the `musttail` call to make sure that the `ret` is
hoisted along with the call. This can only happen when no instruction
has been skipped so both successors be folded into the the predecessor,
leaving a valid `musttail` call.
Reproducer https://godbolt.org/z/3vsnz4hc7
[orc-rt] Add Session::tryAttach, construct ControllerAccess in attach. (#207114)
Reworks Session's controller-attachment API so that clients no longer
construct or hold a ControllerAccess directly:
- attach<ControllerAccessT>(BI, Args...) now constructs the
ControllerAccess internally, passing *this as the first constructor
argument. Suitable for ControllerAccess implementations whose
construction cannot fail.
- tryAttach<ControllerAccessT>(BI, Args...) is the new fallible
counterpart to attach. It forwards *this and the given args to
ControllerAccessT::Create, which must return an
Expected<std::shared_ptr<ControllerAccessT>>. On success, proceeds to
call connect on the instance, otherwise returns the Error. This lets
implementations surface setup failures (e.g. failing to bind a socket)
synchronously as an Error, without ever handing back a
usable-but-unconnected object.
[4 lines not shown]
[InlineAsm] Diagnose oversized non-scalar tied asm outputs (#206230)
The 'r' asm constraint binds an operand to a general-purpose register.
For tied inline asm operands, Clang may promote a smaller integer input
to match a larger non-scalar register output. Only allow that path when
the output size can be represented by an integer type that fits in a
general-purpose register.
Otherwise, diagnose with err_store_value_to_reg before CodeGen attempts
to lower the asm and crashes.
This keeps GPR-sized aggregate/class outputs accepted while rejecting
larger array, struct, union, complex, vector, and class outputs. Add
Sema coverage for the affected C and C++ cases.
Fixes #204775
[orc-rt] Apply noexcept to more Error.h APIs. (#207109)
These APIs are all unconditionally nothrow: their bodies either call
already-noexcept APIs, or move std::string / std::exception_ptr /
std::unique_ptr members whose move constructors are already noexcept.