[AMDGPU] Rewrite `-march` to `-mcpu` in the AMDGPU Toolchain (#198877)
Summary:
Pretty much every target uses either `-mcpu` or `-march` consistently.
AMDGPU has been accidentally using both for a while, mostly from some
fallout with the OpenMP Toolchain. This is too deep to pull out without
potentially disrupting users, but I want to at least contain it by
canonicalizing `-march` to `-mcpu` in the driver. This means we don't
need to bother checking both like every other target does.
[flang-rt] Fix ISO test not respecting real kind flags (#198922)
Summary:
The test previously did not account for CMake overrides, so we just grab
the file that's actually generated. `sort -u` should handle the case
where there's both a .so and .a.
[lldb] Make TypeSystem::m_sym_file atomic to fix data race (#198923)
SymbolFileCommon::GetTypeSystemForLanguage unconditionally writes this
pointer with `ts->SetSymbolFile(this)` on every lookup, which races with
concurrent reads from other threads.
The race is benign in practice: there is exactly one SymbolFile per
Module, so every writer stores the same pointer, but it is still
undefined behavior under the C++ memory model.
Make the field std::atomic<SymbolFile *> and turn SetSymbolFile into a
compare-exchange that asserts a TypeSystem is never rebound to a
different SymbolFile, documenting the invariant that lets us get away
with this.
The alternative is to have the SymbolFile pointer passed in through the
constructor, but that would require updating a bunch of call sites,
including various plugin interfaces.
Found by ThreadSanitizer as part of #197792.
[AggressiveInstCombine] Recognizing tail truncation in the popcount pattern (#198658)
We're currently able to recognize the following popcount pattern
```
int popcnt(unsigned x) {
x = x - ((x >> 1) & 0x55555555);
x = x - 3*((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}
```
but if a truncation follows right after the last AND instruction:
```
int16_t popcnt(unsigned x) {
x = x - ((x >> 1) & 0x55555555);
x = x - 3*((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
[12 lines not shown]
Revert "[LLDB] Add a progress event to xcrun invocations (#198931)" (#198945)
This change requires Host link against Core, and it cannot do that; it
may only link in Utility. Reverting so Adrian can decide what to do.
This reverts commit 5c63509f4cc356639d9c4067e0812c2312689363.
[Github] Add timeouts to libc tests (#198934)
None of these jobs do not take anywhere close to the six hour timeout
that Github uses by default. Set timeouts that are 2-3x the typical job
runtime so that if there is a test/build step that hangs indefinitely,
the job times out in a reasonable amount of time and does not hold any
resources that could be used elsewhere.
This should not impact any jobs that do not hang, will not change the
result of jobs that do hang, and means we can more effectively deal with
cases like today where tests were hanging, from a resource perspective.
This is also standard in some other workflows like the main premerge
workflow definition.
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[AMDGPU] Implement CFI for CSR spills
Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.
Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.
Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Implement CFI for non-kernel functions
This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.
Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[Clang] Default to async unwind tables for amdgcn
To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.
There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.
Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
[AMDGPU] Emit entry function Dwarf CFI
Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.
Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU
While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).
Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
[MIR] Error on signed integer in getUnsigned
Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.
Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
[lldb] Ensure libunwind architecture matches test for TestLibUnwindRetInjection.py (#198884)
If the test is run arm64e while the just-built libunwind is arm64 only,
the test will not function correctly.
[OpenMP][OMPIRBuilder] Refactor removeUnusedBlocksFromParent
This is essentially post-commit review for #198690 which was landed
quickly to fix nondeterminism in tests introduced in #197637
Change-Id: Ib3603ef3c70dde5bb22d0fc04d9249e62ecccf0c
Co-authored-by: @Meinersbur
Co-authored-by: @chichunchen