[llvm-dwp] Fix incorrect ELF OS/ABI in DWP output (#198486)
I received a report internally that
https://github.com/llvm/llvm-project/pull/192112 caused issues with
lldb.
LLDB has not able to load the dwp files because of the OS mismatch
between the binary and dwp file.
Investigating, it turns out that the refactor caused DWPWriter to call
`ELFObjectFileBase::getOS()` which sets the output OS/ABI, but getOS()
returns `Triple::OSType`, not the raw `e_ident[EI_OSABI]` byte. These
enums have different numbering :( oops.
This caused certain tools that validate OS/ABI consistency between a
binary and its DWP to reject the debug info.
Fix by adding getEIdentOSABI() to ELFObjectFileBase (parallel to
getEIdentABIVersion()) and using it instead of getOS().
Assisted-by: Claude
[lldb-dap] Add missing `arguments` field to LldbDapProcessEntry (#198597)
The TypeScript interface was missing the optional `arguments` field that
`parseListProcessesOutput` reads and `pick-process` displays, breaking
the extension build.
[lldb] Don't require a real target for `target modules list -g` (#198594)
The `-g` flag lists the global module list, which doesn't need a target.
Switch to eCommandAllowsDummyTarget and error out explicitly in
DoExecute on the non-global paths when no real target is selected.
Fixes a regression introduced by #198429.
[AMDGPU] Disable dpp src1 sgpr on gfx11 (#164241)
https://github.com/llvm/llvm-project/pull/67461 enabled SGPRs as src1 by
default for all dpp opcodes with manual checks for targets where this is
not supported. In that case, isOperandLegal checked if the second
operand is legal as src0.
https://github.com/llvm/llvm-project/pull/155595 disabled this check by
removing the calls to isOperandLegal, which resulted in SGPRs being used
as operands for src1 on gfx11. This PR reenables this check and fixes
the lit test.
---------
Co-authored-by: Paul Trojahn <paul.trojahn at amd.com>
[libc] Remove broken __builtin_aarch64_wsr fallback in set_thread_ptr (#197295)
The fallback used __builtin_aarch64_wsr (32-bit) instead of
__builtin_aarch64_wsr64, truncating the 64-bit thread pointer value and
causing non-deterministic runtime crashes.
Modern GCC correctly warns about it and -Werror=conversion catches it.
```
/var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp: In function ‘bool __llvm_libc_22_1_5_::set_thread_ptr(uintptr_t)’:
/var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp:90:38: error: conversion from ‘uintptr_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
90 | __builtin_aarch64_wsr("tpidr_el0", val);
| ^~~
cc1plus: all warnings being treated as errors
```
[CIR] Implement atomic cmp exhange with non-const 'weak' lowering (#198546)
This was left as an NYI, but appears in self build!
This patch follows the existing solution in that we are doing the
branching of weak vs not-weak at the CIR level. This is necessary
because the LLVM intrinsics (and the CIR operaions) take 'weak' as a
constant value.
Unlike classic-codegen, this patch uses an 'if' instead of a 'switch' on
the 'weak' value. This is mainly for readability (since it is a switch
over a bool!), but also because our 'switch' doesn't seem to support
'bool', so this would require an additional cast.
As a future direction, we may wish to modify the CIR operations to take
'weak' and 'failure' value (both are constants in LLVM intrinsics!) as
non-constants, and handle the switch/if statement during lowering. This
would give us an opportunity to optimize the value out without having to
collapse the if/switch/etc, and minimize the size of the CIR. However,
as that is a larger direction, this patch skips that for now.
[CIR] Fix assumption that 'curFn' is always a function in direct-call (#197766)
The code to do some checking with a builtin function tried to tell
whether it is being called inside of a function of the same name. This
isn't necessarily true (that it is in a function), since we generate
'global' ops as a curFn too. This patch just removes the assumption and
changes the condition to only happen when we're in a function.
[AMDGPU] Gate `S_LSHL[1-4]_ADD_U32` patterns on uniform results (#198508)
Like the other SOP2 patterns in this file, these scalar instructions
require the result to be uniform. Wrap them in `UniformBinFrag` so
divergent shl/add chains use `V_LSHL_ADD_U32`
[mlir][AMDGPU] Extend amdgpu.transpose_load for gfx1250 (#198354)
This commit adds support for gfx1250's ds_load_tr* instructions to
`amdgpu.transpose_load` since they're pretty close to the gfx950 ones.
---------
Co-authored-by: Codex <codex at openai.com>
[dsymutil] Add missing --linker {classic,parallel} in tests (#198568)
As I'm preparing to toggle the default, I found another set of tests
that don't explicitly pass the linker to dsymutil.
[IR] Explicitly note C standard library UB (#198562)
This language is to my understanding a bit outdated (if we're in a
freestanding environment, we should be handling things fine to my
knowledge, or at least I'm not aware of any outstanding issues reported
by people compiling for freestanding environments/different languages
which are somewhat prominent at this point). The language here dates
back to
68f971b1d67d51272f5c141fc9e4740e27e279f4 with some minor modifications
in 722212d1a0672ae18a23db58c4cfb7e38073abfa. Explicitly note the UB
aspect as this came up recently when working on llubi in #190147 and I
do not think hurts to explicitly note.
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[AMDGPU] Implement CFI for CSR spills
Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.
Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.
Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Emit entry function Dwarf CFI
Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.
Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Implement CFI for non-kernel functions
This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.
Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[Clang] Default to async unwind tables for amdgcn
To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.
There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.
Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU
While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).
Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
[MIR] Error on signed integer in getUnsigned
Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.
Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
[flang][OpenMP] Remove ompFlagsRequireMark from symbol resolution
The `ompFlagsRequireMark` set was there to make sure that we put the flags
from it on symbols even when no new symbols needed to be created.
Instead of doing that, we can just put the flag on the symbol every time.
There is no harm in having these flags, it's just extra information.
[MC] Add -sched-model-reservation-station-scale-factor option (#195638)
This patch adds a new CLI option to the MC layer called
`-sched-model-reservation-station-scale-factor` that enables to scale
the buffer size of all reservation stations (RS) for resources in the
scheduling model by a positive `float` factor. It is limited to scaling
OOO resources, and not special buffer sizes (-1,0,1), and similarly it
is only allowed to produce OOO resources.
This can be used for example to try find headroom in post-RA instruction
scheduling for OOO cores - e.g. scale RS size by 2 and observe IPC
gains, if so the code may be senetitive to the schedule and we may do
better.
Note: Currently, BufferSize for LSU resources defines the reservation
station (RS), but if present also the ld/st queue size, which just
points to the provided LSU resource. Thus, we currently scale them both
in lockstep, until we have an independent ld/st queue model.
[18 lines not shown]
[SmallPtrSet] Drop tombstones in large mode (#197637)
SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.
Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R. `erase` opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot. The scan stops at the next empty bucket, which is guaranteed to
exist.
`remove_if` clears matches in a single pass then calls `Grow` at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)
My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.
[7 lines not shown]
[CUF] Fix `do concurrent` PFT navigation in `cuf.kernel` lowering
When lowering a cuf kernel directive with an explicit loop depth (e.g.
`do(2)`), the code navigated `nestedLoops-1` levels deeper in the PFT
expecting nested regular do-loops. For a `do concurrent` construct with
multiple index variables (e.g. `do concurrent(i=1:k,j=1:n)`), the PFT
represents the entire construct as a single node, so the extra
navigation stepped off into a plain statement with no nested
evaluations and triggered an assertion in `getNestedEvaluations()`.
Fix by skipping the depth navigation loop for `do concurrent`, which is
always a single flat PFT construct regardless of how many index
variables it declares.
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[CUF] Fix `do concurrent` IV type in `cuf.kernel` lowering (#198584)
Induction variables in a `do concurrent` construct inside a cuf kernel
directive were allocated with the `index` type instead of the
Fortran-declared `integer` type. This caused a type conversion failure
when the index variable was used in a context requiring a different
integer or real type (e.g. `real(i)`).
Fix by using `genType(*name.symbol)` to derive the allocation type from
the symbol's Fortran declaration.
PR stack:
- https://github.com/llvm/llvm-project/pull/198584 ◀️
- https://github.com/llvm/llvm-project/pull/198585
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[MLIR][XeGPU] Temporarily disable XeGPU peephole optimizer for CRI (#198031)
Temporarily disabling the optimization pass until a fix is ready.
Support for CRI was added in recent PR:
https://github.com/llvm/llvm-project/pull/197229
But had post merge issues.