AMDGPU: Fix verifier error when waterfall call target is in AV register
This isn't an ideal fix; technically this should be an optimization path
we shouldn't need to go down. The base path where a copy will be inserted
is still broken.
The lit test changes are mostly regressions to be fixed later.
AMDGPU: Constrain readfirstlane operand when writing to m0
Fixes another verifier error after introducing AV registers.
Also fixes not clearing the subregister index if there was
one.
[TableGen] Split *GenRegisterInfo.inc. (#167700)
Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.
AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.
Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.
It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.
AArch64: rewrite the CSR compuation (#167967)
Rather than having a separate path for Darwin, and then a partial
handling for Windows, and then the remainder using its own path, unify
the three paths. Use a switch over the calling convention to avoid
having to check and handle the calling convention in a variety of
places. This simplifies the logic and avoids accidnetally missing a
calling convention (such as we had done with PreserveMost, PreserveAll
on Windows).
[lldb][nfc] Initialize m_initial_sp in ctor for UnwindAssemblyInstEmulation (#167914)
Also rename the "sp" suffix (originally intended to mean "Stack
Pointer") to "cfa", as "sp" generally means Shared Pointer.
[flang][runtime] Advance output record in specific case (#167786)
When a formatted WRITE takes place in a defined output subroutine called
from a context in which record advancement is allowed, such as NAMELIST,
the char-string-edit-descs in the format can trigger record advancement.
Also clean up confusing messiness lingering from the separation of
iostat.h two headers in flang/.../Runtime. iostat.h didn't need to be
put into flang/.../Runtime since it's included only by flang-rt, and
iostat-consts.h doesn't need one of its includes.
Fixes https://github.com/llvm/llvm-project/issues/167757.
[flang] Disable some warnings with ineluctable false positives (#167714)
There are a few well-meaning warnings for some cases of the FPTR=
argument to C_F_POINTER() that can be false positives, since the
restrictions in the standard are dependent on the source of the CPTR=
argument. Further, there is no way to alter a program to avoid these
warnings, so one cannot compile a correct and conforming program with
-pedantic -Werror. Disable these warnings.
Fixes https://github.com/llvm/llvm-project/issues/167470.
[AMDGPU] Ensure SCC is not live before shrinking to s_bitset* (#167907)
Ensure SCC is not live before shrinking s_and*/s_or* instructions to
s_bitset*.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
[Sparc] Optimize compare instruction (#167140)
If we need to compare the result of a computation with 0, we can
sometimes replace the last instruction in the computation with one that
sets the integer condition codes. We can then branch immediately based
on the zero-flag instead of having to use an extra compare instruction
(a SUBcc instruction).
This is only possible if the result of the compare is not used anywhere
else and that no other instruction modifies the integer condition codes
between the time the result of the computation is defined and the time
it is used.
---------
Co-authored-by: Daniel Cederman <cederman at gaisler.com>
[Clang][OpenMP] Bug fix Default clause variable category (#165276)
In the default clause taking care of new comments in the previous
"Support for Default clause variable category"
[157063](https://github.com/llvm/llvm-project/pull/157063) and adding a
new test case.
---------
Co-authored-by: Sunil Kuravinakop <kuravina at pe31.hpc.amslabs.hpecorp.net>
[OpenMP][Flang] Emit default declare mappers implicitly for derived types (#140562)
This patch adds support to emit default declare mappers for implicit
mapping of derived types when not supplied by user. This especially
helps tackle mapping of allocatables of derived types.
[AMDGPU] Rematerialize VGPR candidates when SGPR spills to VGPR over the VGPR limit
Before, when selecting candidates to rematerialize, we would only
consider SGPR candidates when there was an excess of SGPR registers.
Failing to eliminate the excess would result in spills to VGPRs.
This is normally not an issue, unless spilling to VGPRs results in
excess VGPRs.
This patch does 2 things:
* It relaxes the GCNRPTarget success criteria: now we accept regions
where we spill SGPRs to VGPRs, as long as this does not end up in
excess VGPRs.
* It changes isSaveBeneficial to consider the excess VGPRs (which
includes the SGPRs that would be spilled to VGPR).
With these changes, the compiler rematerializes VGPRs when the excess
SGPRs would result in VGPR excess.
[4 lines not shown]
[flang][OpenMP] Store Block in OpenMPLoopConstruct, add access functions
Instead of storing a variant with specific types, store parser::Block
as the body. Add two access functions to make the traversal of the nest
simpler.
This will allow storing loop-nest sequences in the future.