[Clang] Default to async unwind tables for amdgcn (#183148)
To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.
There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.
Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
Fix MSVC template parsing error in SerializationFormat (#196571)
This commit fixes a hard compilation error on Windows (when building with
Clang's MSVC compatibility mode) and a subsequent access violation that
occurred during Windows CI testing.
Root Causes:
1. When compiling with `-fms-compatibility`, Clang's two-phase template
lookup fails to resolve function-local static variables (`SavedSerialize`
and `SavedDeserialize`) captured by a local class (`ConcreteCodec`) inside
an uninstantiated template. It incorrectly assumes they are members of a
dependent base class.
2. Originally, `TypedSerializerFn` and `DeserializerFn` were typed as
`llvm::function_ref`. Storing these in static variables created dangling
pointers, as `function_ref` is a non-owning wrapper that only referenced
the temporaries decaying on the constructor's stack, causing an 0xC0000005
access violation on x64 Windows.
The Fix:
[11 lines not shown]
[LifetimeSafety] Expand diagnostic list that enables analysis (#198599)
Now, when any lifetime safety related diagnostic is not ignored, we run
the analysis.
No tests were added since this does not add new functionality.
[NVPTX] Constant fold clusterDim when reqnctapercluster is specified (#195967)
This is a follow-up of https://github.com/llvm/llvm-project/pull/191575.
Currently, NVPTX cannot fold the `cluster_nctaid.x/y/z` and
`cluster_nctarank` intrinsic calls into const values when
`reqnctapercluster` is specified, which prevents the code from further
optimization.
Therefore, in this change, we extend the `NVVMIntrRange` pass to:
- Tighten `cluster_nctaid.x/y/z` intrinsic calls to one value range,
which can be const folded in later InstCombine pass
- Tighten `cluster_nctarank` intrinsic calls to one value range when
`cluster_dim` is specified
- Tighten `cluster_ctaid.x/y/z` range attributes to use per-dimension
`cluster_dim` bounds
[clang-format] Harden annotation of operator keywords (#196768)
The star was already annotated as TT_PointerOrReference, just overwrite
it for the sake of not crashing. Also remove the annotation above, since
that would always be overwritten (or at least I don't see when not, and
there's no failed test).
Fixes #196054.
[libc][mathvec] Add exhaustive tester for SIMD math routines (#189488)
An exhaustive tester based on the scalar version.
Uses LIBC scalar math routines as a reference rather than MPFR
Also corrects a missed 1ULP value in expf when the target doesn't
support FMAs
[CIR] Fix get_method callee type for member pointer calls (#198358)
Member-pointer calls through `cir.get_method` were lowering to an
indirect
callee type that still listed the member function's implicit `this`
parameter after `createGetMethod` had already prepended the adjusted
`void*` receiver. A call like `(obj->*pmf)(arg)` therefore carried a
three-parameter `var_callee_type` but only two argument operands, and
`-fclangir -emit-llvm` failed LLVM's variadic-call verifier with
`expected var_callee_type to have at most N parameters`. Classic codegen
emits `(ptr, …)` for the same pattern.
The libc++ sweep had one remaining `frontend-crash-other` bucket hit on
`F_nullptr.pass.cpp`, which boils down to `__builtin_invoke` on a
varargs member function pointer — the same callee/operand mismatch in a
minimal repro.
The fix skips the implicit-`this` slot when cloning the member signature
into the callee function type in `createGetMethod`, and tightens
[3 lines not shown]
[CIR] Guard union ABI alignment when getLargestMember is empty (#198340)
Padding-only unions (an empty union lowered as a single `!u8i`
padding member) leave `getLargestMember()` null when CIRGen walks
record layout through MLIR's DataLayout API.
`RecordType::getABIAlignment`
then passed that null `Type` into `getTypeABIAlignment` and crashed.
This showed up compiling libc++ types such as
`std::__variant_detail::__union` nested under `common_iterator`.
Return ABI alignment `1` when there is no largest member, matching a
byte-padded empty union. This parallels how empty unions are already
handled for size (`getTypeSizeInBits` uses zero size in that situation).
Regression coverage adds a nested-union global in `empty-union.cpp`.
[mlir][LLVMIR] Allow address-of-global as a leaf in array constants (#198424)
Large `llvm.mlir.global` initializers built as nested `llvm.insertvalue`
chains make `LLVMModuleTranslation::convertGlobalsAndAliases` call
`ConstantFoldInsertValueInstruction` on every step, rebuilding the
whole `ConstantArray` each time. That is O(N²) in the number of
elements and shows up as multi-minute compiles on translation units with
huge pointer tables (SPEC CPU 2026 `gcc/insn-automata.cc` is the
motivating case; Eric Keane's `convertOperationImpl` profile matches
this
path).
This change lets `llvm.mlir.constant` carry an `ArrayAttr` of
`FlatSymbolRefAttr` leaves that name globals (not just functions), adds
a name-keyed global map beside the existing op-keyed map, and resolves
those refs in `getLLVMConstant`. A translate test checks the resulting
single LLVM constant array initializer.
[flang] Recognize effects on non-addressable resources in opt-bufferization (#198051)
opt-bufferization has been only handling `fir::DebuggingResource`
explicitly. This patch adds support for other non-addressable resources,
such as `fir::VolatileMemoryResource`. This allows merging
elemental/assign for the `volatile_src_nonvolatile_dst` example in the
updated LIT test.
[Docs] Reccomend Container Pinning (#198572)
Add some info to CI Best Practices about pinning container images to a
specific image SHA, which we agreed was a best practice in #197315 (and
maybe somewhere else, but I cannot find anything).
This updates the best practices but does not currently attempt to
actually fix all the cases where we are using unpinned container images.
[llvm-dwp] Fix incorrect ELF OS/ABI in DWP output (#198486)
I received a report internally that
https://github.com/llvm/llvm-project/pull/192112 caused issues with
lldb.
LLDB has not able to load the dwp files because of the OS mismatch
between the binary and dwp file.
Investigating, it turns out that the refactor caused DWPWriter to call
`ELFObjectFileBase::getOS()` which sets the output OS/ABI, but getOS()
returns `Triple::OSType`, not the raw `e_ident[EI_OSABI]` byte. These
enums have different numbering :( oops.
This caused certain tools that validate OS/ABI consistency between a
binary and its DWP to reject the debug info.
Fix by adding getEIdentOSABI() to ELFObjectFileBase (parallel to
getEIdentABIVersion()) and using it instead of getOS().
Assisted-by: Claude
[lldb-dap] Add missing `arguments` field to LldbDapProcessEntry (#198597)
The TypeScript interface was missing the optional `arguments` field that
`parseListProcessesOutput` reads and `pick-process` displays, breaking
the extension build.
[lldb] Don't require a real target for `target modules list -g` (#198594)
The `-g` flag lists the global module list, which doesn't need a target.
Switch to eCommandAllowsDummyTarget and error out explicitly in
DoExecute on the non-global paths when no real target is selected.
Fixes a regression introduced by #198429.
[AMDGPU] Disable dpp src1 sgpr on gfx11 (#164241)
https://github.com/llvm/llvm-project/pull/67461 enabled SGPRs as src1 by
default for all dpp opcodes with manual checks for targets where this is
not supported. In that case, isOperandLegal checked if the second
operand is legal as src0.
https://github.com/llvm/llvm-project/pull/155595 disabled this check by
removing the calls to isOperandLegal, which resulted in SGPRs being used
as operands for src1 on gfx11. This PR reenables this check and fixes
the lit test.
---------
Co-authored-by: Paul Trojahn <paul.trojahn at amd.com>
[libc] Remove broken __builtin_aarch64_wsr fallback in set_thread_ptr (#197295)
The fallback used __builtin_aarch64_wsr (32-bit) instead of
__builtin_aarch64_wsr64, truncating the 64-bit thread pointer value and
causing non-deterministic runtime crashes.
Modern GCC correctly warns about it and -Werror=conversion catches it.
```
/var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp: In function ‘bool __llvm_libc_22_1_5_::set_thread_ptr(uintptr_t)’:
/var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp:90:38: error: conversion from ‘uintptr_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
90 | __builtin_aarch64_wsr("tpidr_el0", val);
| ^~~
cc1plus: all warnings being treated as errors
```
[CIR] Implement atomic cmp exhange with non-const 'weak' lowering (#198546)
This was left as an NYI, but appears in self build!
This patch follows the existing solution in that we are doing the
branching of weak vs not-weak at the CIR level. This is necessary
because the LLVM intrinsics (and the CIR operaions) take 'weak' as a
constant value.
Unlike classic-codegen, this patch uses an 'if' instead of a 'switch' on
the 'weak' value. This is mainly for readability (since it is a switch
over a bool!), but also because our 'switch' doesn't seem to support
'bool', so this would require an additional cast.
As a future direction, we may wish to modify the CIR operations to take
'weak' and 'failure' value (both are constants in LLVM intrinsics!) as
non-constants, and handle the switch/if statement during lowering. This
would give us an opportunity to optimize the value out without having to
collapse the if/switch/etc, and minimize the size of the CIR. However,
as that is a larger direction, this patch skips that for now.
[CIR] Fix assumption that 'curFn' is always a function in direct-call (#197766)
The code to do some checking with a builtin function tried to tell
whether it is being called inside of a function of the same name. This
isn't necessarily true (that it is in a function), since we generate
'global' ops as a curFn too. This patch just removes the assumption and
changes the condition to only happen when we're in a function.
[AMDGPU] Gate `S_LSHL[1-4]_ADD_U32` patterns on uniform results (#198508)
Like the other SOP2 patterns in this file, these scalar instructions
require the result to be uniform. Wrap them in `UniformBinFrag` so
divergent shl/add chains use `V_LSHL_ADD_U32`
[mlir][AMDGPU] Extend amdgpu.transpose_load for gfx1250 (#198354)
This commit adds support for gfx1250's ds_load_tr* instructions to
`amdgpu.transpose_load` since they're pretty close to the gfx950 ones.
---------
Co-authored-by: Codex <codex at openai.com>
[dsymutil] Add missing --linker {classic,parallel} in tests (#198568)
As I'm preparing to toggle the default, I found another set of tests
that don't explicitly pass the linker to dsymutil.
[IR] Explicitly note C standard library UB (#198562)
This language is to my understanding a bit outdated (if we're in a
freestanding environment, we should be handling things fine to my
knowledge, or at least I'm not aware of any outstanding issues reported
by people compiling for freestanding environments/different languages
which are somewhat prominent at this point). The language here dates
back to
68f971b1d67d51272f5c141fc9e4740e27e279f4 with some minor modifications
in 722212d1a0672ae18a23db58c4cfb7e38073abfa. Explicitly note the UB
aspect as this came up recently when working on llubi in #190147 and I
do not think hurts to explicitly note.
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[AMDGPU] Implement CFI for CSR spills
Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.
Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.
Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Emit entry function Dwarf CFI
Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.
Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>