AMDGPU: Migrate unittests to subarch triples
Replace specifying a processor name with the triple
subarch.
The register-limit helpers in AMDGPUUnitTests.cpp that enumerate every
valid CPU via fillValidArchListAMDGCN still pass the CPU explicitly, as
does the MC Disassembler smoke test (its C disassembler API derives the
subtarget from the CPU, not the triple subarch).
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
clang/AMDGPU: Stop passing redundant -target-cpu to cc1
Now that the exact target is encoded in the triple's subarch field,
-target-cpu is redundant. This avoids polluting the resultant IR with
unwanted "target-cpu" attributes. The net result is the desired codegen
when compiling libraries for a major subarch and linking it into a
program compiled for a specific arch. e.g., compiling for "gfx9-generic"
would pollute the IR with "target-cpu"="gfx9-generic", so codegen
would ultimately be performed for the generic target even after
linking into the concrete gfx9 cpu. The specialization will now be
achieved by merging the triples without the linker or optimization
passes needing to fixup function attributes.
clang: Start using new amdgpu subarch triples
Fixup invocations using --target=amdgcn + -mcpu to introduce
the subarch in the triple.
For offload toolchains, a single toolchain is constructed for the
top level amdgpu architecture, and the effective triple is used for
target specific tool invocations.
The specifics of the resource directory layout are tbd. This does
try to find resources in the subarch named directory. The paths
are searched at toolchain creation time, so that does not work
when there are multiple subarches.
Fixes #154925
clang/AMDGPU: Validate -target-cpu in cc1 is valid for the subarch
Restrict the reported list of valid target-cpus based on the triple's
subarch. This is more consistent with how other targets validate the
target CPU name. Currently we have split handling validating the target
name for the triple in both the driver and here. The driver based diagnostic
seems to be an amdgpu-ism in 2 different places (though there is one arm
validation emitting the same diagnostic). In the future we could probably
drop those.
AMDGPU: Introduce amdgpu triple arch
Move towards using the triple for representing incompatible
ISA changes. Use the subarch field to represent the various
incompatible cases. Previously we pretended a single triple arch
was universally compatible, and only distinguished by function
level subtargets. Move towards using distinct triples to enable
more sophisticated toolchain handling in the future, like proper
runtime library linking.
Introduce a new subarch per unique ISA, but also introduce
"major subarches" which are compatible by a set of covered
minor ISA versions. These map to the existing generic targets.
There are a few placeholder subarch entries, which currently
have missing backing generic arches for codegen.
This should be the preferred triple arch name going forward,
but is treated as an alias of amdgcn. This does not yet change
clang to emit the new triples.
[2 lines not shown]
[X86] Move more vector.reduce.add subvector pattern tests to PhaseOrdering/X86/horizontal-reduce-add.ll (#206467)
CodeGen test coverage is already in vector-reduce-add-subvector.ll
[SystemZ] Limit latency scheduling to SUs with latency of at least 5. (#206459)
The latency reduction heuristic is highly effective, but it seems preferred
to not "move everything around", but rather focus on instructions that have
somewhat longer latencies. The basic idea behind this is that the input
order is fairly good to begin with and not just "any random order", so it
should not be disturbed unnecessarily.
[libc++] Consistently use version guards outside namespace macros (#181136)
This has the benefit of consistent placement for these macros (outside
the namespace, not inside it). This, in turn, makes it more obvious that
the condition applies to the entire namespace scope. Compile times are
also reduced a bit, since the compiler doesn't have to open a namespace
just to close it again.
This is enforced with a clang-tidy check.
As a drive-by this also fixes a few macro comments.
[MemCpyOpt] Fix incorrect size check in memmove of memset opt (#206451)
We were only checking that the memset is at least as large as the
memmove size, but not accounting for the fact that the memmove occurs at
an offset.
[Dexter] Add !address node (#202801)
Adds a node type for Dexter that allows checking abstract labels instead
of concrete addresses. Each address node has a label and optional
offset, and the first time during evaluation that a given address label
is matched against a valid pointer value, the address label will be
assigned a value that matches the seen address (adjusting for any
offset). From that point, the resolved address value will be used for
the remainder of the test evaluation.
Diagnose noreturn calls from a const or pure function (#206134)
The const and pure functions add the WillReturn LLVM IR attribute which
require the function to return. Calling a noreturn function is UB, so it
is now being diagnosed unless the call is known to be unevaluated.
This diagnostic is enabled by default.
Fixes #129022
Line and digit directives, OriginalFileName, ModuleName should be unevaluated strings (#201413)
Based on
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2361r6.pdf,
line and digit directives should be unevaluated strings. This patch
changes the HandleLineDirective and HandleDigitDirective to parse
strings as unevaluated string literals and fixes the testcase to not
have escape sequences.
[clang][bytecode] Fix an assertion failure in dynamic_cast handling (#206447)
If `Ptr` is already a root pointer, the `getBase()` call ran into an
assertion. Fix this by moving the check to the start of the loop.
[AMDGPU] Fix bit-packing condition in LiveRegOptimizer (#201520)
This commit changes the condition for determining the eligibility for
bit-packing in LiveRegOptimizer from requiring that the scalar target
type is larger than the source type to instead require that the target
is a multiple of it.
Fixes https://github.com/llvm/llvm-project/issues/196582.
---------
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
[SafeStack] Allocate unsafe sigaltstack (#206463)
PR https://github.com/llvm/llvm-project/pull/196969 was approved and
merged but with `spr` the wrong base branch was set.
This merges the approved changes into `main`
---------
Co-authored-by: Paul Walker <paul.walker at arm.com>
Co-authored-by: Arseniy Obolenskiy <arseniy.obolenskiy at amd.com>
[RFC][SIInsertWaitCnts] Remove VMemTypes
This can be considered a RFC. I'd personally like to get rid of VMEMTypes
but I don't know if anyone feels strongly that they should be kept.
My motivation for removing VMemTypes is simple: They are just a repeat
of VMEM events, just under a different name, and messier (defined as a
basic enum but actually stored as a bitmask later). It's just confusing.
This patch eliminates the need for them by:
- Adding a new entrypoint in AMDGPUHWEvents to get the basic set of
VMEM events issued by a VMEM Instruction.
- Set BVH/SAMPLER events irrespective of whether the HW can track them.
These events exist anyway, it should be up to InsertWaitCnt to deal with them
properly (which is easy, only `counterOutOfOrder` needed work).
- Tracking an additional set of per-VGPR "PendingEvents" which is
set using the "basic set of VMEM events" and cleared as needed.
[3 lines not shown]
[AMDGPU][InsertWaitCnts] Improve testing coverage for VMemTypes (#206439)
The next patch will get rid of VMEMTypes, but some of its functionality
was not tested well. This patch adds more tests to fix gaps in testing
so that the next patch in the stack can be shown to have no impact
on codegen.
The 2 issues I had noticed was that we didn't have a test for mixed
vmem types w/ BVH, only for sampler; we also did not have a test
that was affected by `clearVmemTypes`, so removing that call
had no effect.
This patch adds tests for both cases.