[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[AMDGPU] Implement CFI for CSR spills
Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.
Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.
Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Emit entry function Dwarf CFI
Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.
Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Implement CFI for non-kernel functions
This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.
Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU
While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).
Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
[Clang] Default to async unwind tables for amdgcn
To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.
There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.
Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
[MIR] Error on signed integer in getUnsigned
Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.
Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
[llvm-profgen] Fix assertion condition for the top-level probes address range checks (#198674)
Top-level functions may contains multiple disjoint address ranges, and
the pseudo probes may not be stored as sorted. The original assertion
check is problematic because the first pseudo probe may not necessarily
falls into the current function range even though they are both part of
the same function.
Update the assertion condition so that we won't hit the assertion
failure as long as there one pseudo probe falls in the checked function
range.
rk_gpio: implement PIC masking methods and mask unhandled IRQs
The Rockchip GPIO controller implements PIC operations for the INTRNG
framework but is missing four masking methods that INTRNG calls during
the filter/ithread handoff: pic_disable_intr, pic_enable_intr,
pic_pre_ithread, pic_post_ithread.
Without them, level-sensitive interrupt sources connected to a
Rockchip GPIO pin re-fire continuously while their ithread runs. On
a RockPro64 with a FUSB302B Type-C controller (i2c) attached to
gpio1 INT_N, the system enters a ~210 kHz interrupt storm the moment
the fusb302 driver attaches and INT_N goes low.
Two complementary changes:
1. Add the four pic_disable_intr/pic_enable_intr/pic_pre_ithread/
pic_post_ithread method bodies. Each toggles the pin's
RK_GPIO_INTMASK bit so the source is masked during the in-flight
ithread window and unmasked on return, honouring the generic
[17 lines not shown]
[CUDA][HIP] Defer device diagnostics in implicit H+D explicit instantiations (#197214)
When clang explicitly instantiates a class template, it must emit
device-side
bodies for the implicit `__host__ __device__` members so the vtable and
instantiation symbols resolve. Some of those members chain into
host-only
calls (for example libstdc++ destructors that eventually call ::operator
delete). If no device code actually uses the class, the user still sees
errors about calling a `__host__` function from device code, even though
they
wrote no device code that touches it. Overload ambiguity in the same
context
behaves the same way.
This patch defers device-side errors in implicit `__host__ __device__`
functions reached only via an explicit template instantiation. At the
end of
the translation unit, clang checks whether a real device caller exists.
[8 lines not shown]
bcm2835_sdhci: Clean up DMA resources on attach failure
bcm_sdhci_attach() allocates a DMA channel with bcm_dma_allocate()
before creating the bus_dma tag and map. If a later initialization
step fails, the common error path releases the interrupt and memory
resources, but leaves the DMA channel allocated.
Call bcm_dma_free() for cleanup, as it already performs the required
internal checks and can therefore be invoked directly.
Signed-off-by: Haoxiang Li <lihaoxiang at isrc.iscas.ac.cn>
Reviewed by: mhorne
MFC after: 3 days
Pull Request: https://github.com/freebsd/freebsd-src/pull/2241
[CIR] Emit globals for declarations that force externally visible defs
CIRGenModule::emitGlobal reported NYI for forward declarations that
force an externally visible definition under C99 inline rules (inline
definition in the TU, then a later extern declaration). Classic
CodeGenModule already materializes these with GetOrCreateLLVMFunction;
wire the same path through getAddrOfFunction, including the AArch64
multiversion guard from OGCG.
This pattern shows up in SPEC CPU 2026 berkeley-abc `if.h` when
compiling `abcIf.c` for 729.abc_r / 829.abc_s (`If_CutCopy` at line
527).
Test: `inline-extern-force-codegen.c` with CIR, LLVM, and OGCG checks.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[CIR] Handle throwing an exception from a cleanup scope (#199121)
The CIR FlattenCFG pass had been ignoring any ThrowOp that occurred
inside a cleanup scope or try operation, which led to the thrown
exception not triggering local cleanups and bypassing local catch
handlers.
This change introduces a new CIR operation, TryThrowOp, which is
analagous to the existing TryCallOp. The TryThrowOp (as well as the
ThrowOp) will eventually be lowered to a function call, but which
function gets called is a target-dependent detail, so we need an
abstract operation before EHABI lowering.
The Flatten CFG pass replaces any ThrowOp inside a cleanup scope or try
operation with a TryThrowOp that has an unreachable normal destination
and unwinds to the appropriate cleanup or catch dispatch block.
Assisted-by: Cursor / claude-opus-4.7-thinking-xhigh
Require a selector when lowering WHEN
This patch ensure we don't have missing selector as unconditional in
lowering since `WHEN` requires a context-selector.
Added negative test to replace the positive test testing against missing
selector.
Use selected variants in metadirective construct context
An enclosing selected begin/end metadirective variant can introduce
a construct that is not represented by the PFT parent chain. For example,
an inner metadirective inside target and an outer-selected parallel must
be able to match construct={target, parallel}.
Collect construct traits from already-lowered enclosing OpenMP operations,
which represent both ordinary enclosing constructs and constructs introduced
by selected variants.
[clang][lit] Add option to skip clang-repl checks (#199255)
Whenever lit or llvm-lit is invoked to run clang tests, clang-repl is
run at least once to check for host jit capabilities, and possibly
several more times to probe related capabilities. This adds a noticeable
delay before testing starts, especially for debug builds.
This change adds a lit parameter (clang_skip_clang_repl_checks) and an
environment variable check (CLANG_LIT_SKIP_CLANG_REPL_CHECKS) to allow
the clang-repl probes to be skipped. When this option is used, any tests
that rely on jit execution will be reported as unsupported.
This option is intended only to allow quicker targeted testing during
development. It should not be used for comprehensive verification before
submitting a patch.
On my local test system, executing `ninja check-clang-cir-codegen` with
a previously completed debug build took 18 seconds to run 354 tests with
this option and 53 seconds without it. This is the sort of use case I am
targeting -- lit test runs when the clang-repl overhead will constitute
a significant portion of the total time to execute the tests.
[SandboxVec][SeedCollection] Iterate over all seeds (#195964)
Even though load seeds can already be collected by the seed collector,
the seed collection pass was not iterating over them. This patch fixes
this, we are now iterating over both store and load seeds.
[bzl] Reduce the `deps` size of libc's shared_math_header library. (#200006)
There were ~500 of them, which can cause build analysis/metric issues.
Glob the private headers in use, retaining only the support libraries
that have source code.
Make it a cc_library instead of a libc_header_library. Rename it
"apfloat_shared_math_headers" to clarify its limited use case.
[CIR] Handle the 'before case' block of a switch statement. (#199752)
Before this patch, we would fail any time there was a block with
entry/exit (in this case, one with successors thanks to a label) with a
verification error. This patch adds special handling for that first
block.
This patch DOES choose to not trim them however. Unless there is a label
inside of the block, there isn't any way to get there, and it is dead
code. I've opted to NOT do that optimization, as I suspect that might be
valuable to future passes/something we may wish to warn about in some
sort of CFG analysis.
Additionally, there is some minor changes to FlattenCFG, first to make
sure we skip the switch ONLY if it is truly empty, and second to make
sure we transform any 'break' in the pre-case region.