[CodeGen] Join live range segments after dead def move (#204648)
Moving a dead def upward can retag a following live-range segment to the
same value as the previous segment. That leaves adjacent same-value
segments, which live range verification rejects.
Add a shared LiveRange helper for merging adjacent same-value segments.
Use it in the existing value-number merge code and after retagging later
segments for a moved dead def. Add an AMDGPU scheduler regression test.
clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block
Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
clang/AMDGPU: Fix double linking opencl libs with --libclc-lib
Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.
Co-Authored-By: Claude <noreply at anthropic.com>
clang/AMDGPU: Merge toolchain subclasses
Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.
That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.
There is additional mess in the handling of spirv, which this
[9 lines not shown]
clang/Driver: Use struct type for BoundArch instead of StringRef
Change BoundArch arguments in the clang driver from StringRef (or
sometimes const char*) to a dedicated struct type that contains both
the architecture string and a parsed OffloadArch enum field. In the
future it may be useful to contain other feature bits here.
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[flang][OpenMP] Move unique clauses to allowedOnceClauses in OMP.td (#204995)
Many unique clauses were listed in "allowedClauses", which turned off
the single-occurrence check in flang. Move these clauses to the right
category to enable this check.
One exception to this is the IF clause: the IF clause is unique for all
non-compound directives, but is repeatable on compound ones with the
restriction that at most one IF clause can apply to any of the
constituents. This restriction is currently not enforced correctly in
flang, and so the IF clause was left unchanged.
Although this change is applied to a file shared between flang and
clang, clang does not use these categories for its checks, and hence is
not affected by this patch.
[SPARC] Use hardware byteswapper when we have V9 (#191720)
On V9 processors we have endianness-adjusted memory operations, that can
be used to implement BSWAPs.
Use those instructions whenever possible to reduce code size.
runtimes: Pass CMAKE_SYSTEM_NAME based on target triple (#203504)
Compute the cmake system name from the target triple, rather
than passing through the host's. This is primarily to stop
forwarding OSX specific cmake variables.
This fixes build failures when trying to build gpu libc on mac
hosts. Previously it would fail on several issues, starting with
an unused argument -mmacos-version-min error, followed by other
errors caused by passing -isysroot.
Secondarily, restrict the cmake imported targets when cross compiling.
Without this, the amdgpu build prints many cmake warnings about the
target not supporting shared libraries.
Claude did most of the actual work, though it required quite a few
rounds of prodding to get it into the right place. In particular it
took care of handling all of the cmake platform recognized names from
the triple.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[flang][OpenMP] Centralize pushing/popping directive context (#204924)
Put calls to PushContextAndClauseSets to the Enter function for
OpenMPConstruct and OpenMPDeclarativeConstruct, and popping the context
to the corresponding Leave functions. This moves most of the context
handling to the top-level AST entries. This will allow more centralized
verification of common clause properties in the future.
Be more permissive on spaces in command line argument parsing (#205111)
Fixes c888371ff0a3e10f8472676dc992f4347fca58d9.
This change properly accommodates both presence and absence of extra
trailing arguments like -resource-dir.
[HIP] Fix `--no-offload-new-driver` behavior after #201457 (#205094)
Summary:
https://github.com/llvm/llvm-project/pull/201457 changed the default for
all targets. Even though the old offload driver is getting removed soon
we shouldn't break it for the LLVM23 release. This simply reverts to the
original behavior, the old driver builds its jobs manually so we can
just turn off this one specific case unless the user forced it.
Fix test on read-only file systems (#205108)
Fixes 25e4057d49055a645dc6a51ae1f40ac647aaed5b.
Use the -fsyntax-only flag instead of -c. This performs the necessary
parsing and diagnostics verification (the actual intent of this test)
without attempting to emit an object file.
[llubi] Reset retval when return type is void (#205107)
In `returnFromCallee`, the return value is moved out from
`CurrentFrame->RetVal`. So `visitReturnInst` is always responsible for
setting a valid value.
Closes https://github.com/llvm/llvm-project/issues/204992
AMDGPU: Replace tgsplit subtarget feature with attribute
This is a per-entrypoint property and has a corresponding
assembler directive, so it should not be baked into the
subtarget. I couldn't find much documentation on what this
actually does, so the description isn't great.
Fixes #204149
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU] Waterfall loop codegen improvement in SIInstrInfo (#192415)
When generating waterfall loops, use the instructions `v_cmpx_eq_*` and
`s_andn2_wrexec_*` as recommended for recent architectures, instead of
`v_cmp_eq_*` and `s_and saveexec`.
This PR only updates waterfall loop code generation in
`SIInstrInfo.cpp`. Other places that generated waterfall loops can be
handled separately.
- Add new lane mask constant for `s_andn2_wrexec`
- Set `isTerminator` for `v_cmpx_eq_{u32,u64}_e32`
- Fix test `mubuf-legalize-operands.mir` to track liveness needed for
verifying phi nodes
- Update .ll and .mir tests to accept the new instruction sequences
Assisted-by: Claude
clang/AMDGPU: Fix double linking opencl libs with --libclc-lib
Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.
Co-Authored-By: Claude <noreply at anthropic.com>
clang/AMDGPU: Merge toolchain subclasses
Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.
That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.
There is additional mess in the handling of spirv, which this
[9 lines not shown]
clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block
Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
clang/Driver: Use struct type for BoundArch instead of StringRef
Change BoundArch arguments in the clang driver from StringRef (or
sometimes const char*) to a dedicated struct type that contains both
the architecture string and a parsed OffloadArch enum field. In the
future it may be useful to contain other feature bits here.
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>