[ELF] Fix /DISCARD/ .eh_frame regression after #179089
When .eh_frame is discarded while .eh_frame_hdr is not, #179089 caused a
crash for `/DISCARD/ : { *(.eh_frame) }`.
Simplify the PT_GNU_EH_FRAME condition from https://reviews.llvm.org/D30885 (2017).
[ELF] Add target-specific relocation scanning for x86 (#178846)
Implement scanSection/scanSectionImpl for i386 and x86-64 to
* enable devirtualization of getRelExpr calls
* eliminate abstraction overhead for PLT-to-PCRel optimization, TLS
relocations
* optimize for R_X86_64_PC32 and R_X86_64_PLT32: they consist of 95%
relocations in `lld/ELF/**/*.o` files.
at the cost of more code.
TLS relocation handling is inlined into scanSectionImpl. Also,
- Remove getTlsGdRelaxSkip
- Replace TLS-optimization-specific expressions:
- R_RELAX_TLS_GD_TO_LE, R_RELAX_TLS_LD_TO_LE, R_RELAX_TLS_IE_TO_LE →
R_TPREL
- R_RELAX_TLS_GD_TO_IE → R_GOT_PC
[13 lines not shown]
[llvm-ir2vec] Adding BB Embeddings Map API to ir2vec python bindings (#180135)
Returns a BB Embedding Map based on the input function name
`getBBEmbMap(funcName) -> Map<BB name, Embedding>`
[AIX] [PowerPC] Auto-enable modern-aix-as feature by default with integrated assembler. (#180778)
**Issue**:
Certain instruction aliases (e.g. `mfsprg`) defined in PowerPC tablegen
as InstAlias are gated behind the ModernAs predicate as InstAlias.
Without the `+modern-aix-as` target feature enabled, the `ModernAs`
predicate is not satisfied and these instructions aliases are
unavailable. This caused assembly failures on AIX unless user manually
specify below options.
`-Xclang -target-feature -Xclang +modern-aix-as`
**Solution**:
Automatically enable the` +modern-aix-as` target feature when:
- The target triple is AIX.
- The integrated assembler is being used (default or -`fintegrated-as`).
This feature is not enabled when `-fno-integrated-as` is specified.
Co-authored-by: Riyaz Ahmad <riyaz.ahmad at ibm.com>
Revert "[MC/DC] Make covmap tolerant of nested Decisions (#125407)" (#181069)
This reverts commit 8f690ec7ffd8d435a0212a191634b544b0741c4f because it
caused errors in collecting coverage.
[C++20] [Modules] Support to generate reduced BMI only (#181081)
Introduced --precompile-reduced-bmi. This allows users to generate
Reduced BMI only.
Previously, users can only generate the reduced BMI as a by product of
other process (e.g, generating an object file or a full BMI). This is
not ideal.
[mlir][AMDGPU] Update gather_to_lds with explicit-async support
This commit takes advantage of the new `load.async.to.lds` intrinsic
in order to add an `async` mode to `gather_to_lds`. In this mode,
completion of the load needs to be managed with `asyncmark` and
`wait.asyncmark` intrinsics instead of being implicitly derived by
alias analysis.
This commit adds the flag, a lowering for it, and updates tests.
Co-authored-by: Claude Opus 4.5 <noreply at anthropic.com>
[mlir][ROCDL] Add async variants of pre-gfx12 LDS load intrinsics
These are MLIR wrappers around #180466.
-----
Co-authored-by: Claude Opus 4.5 <noreply at anthropic.com>
[mlir][ROCDL] Wrap asyncmark and wait.asyncmark intrinsics (#181054)
(see op-level and LLVM documentation for details so I'm not repeating
myself, but these are the general operations for compiler-operated
asynchronous operation tracking, which frees programmers from having to
deal with all the different counters, allows certain optimization, and
doesn't require precise alias analysis)
-----
Co-authored-by: Claude Opus 4.5 <noreply at anthropic.com>
[CIR][LoweringPrepare] Emit guard variables for static local initialization
This implements the lowering of static local variables with the Itanium C++ ABI
guard variable pattern in LoweringPrepare.
When a GlobalOp has the static_local attribute and a ctor region, this pass:
1. Creates a guard variable global (mangled name from AST)
2. Inserts the guard check pattern at each GetGlobalOp use site:
- Load guard byte with acquire ordering
- If zero, call __cxa_guard_acquire
- If acquire returns non-zero, inline the ctor region code
- Call __cxa_guard_release
3. Clears the static_local attribute and ctor region from the GlobalOp
[mlir][ROCDL] Wrap asyncmark and wait.asyncmark intrinsics
(see op-level and LLVM documentation for details so I'm not repeating
myself, but these are the general operations for compiler-operated
asynchronous operation tracking, which frees programmers from having
to deal with all the different counters, allows certain optimization,
and doesn't require precise alias analysis)
-----
Co-authored-by: Claude Opus 4.5 <noreply at anthropic.com>
[clang-tidy][NFC] Update broken HICPP documentation links (#180525)
The original links to the High Integrity C++ Coding Standard now
redirects to an [irrelevant page](https://www.perforce.com/resources)
because Perforce made the document private (it now requires email to
apply).
This PR updates all HICPP-related documentation links to point to the
application form, ensuring users can still find the official source for
these rules.
[libcxxabi] Make fallback malloc heap size configurable via CMake
The emergency fallback heap used during exception handling when
malloc() fails (e.g. under OOM) was hardcoded to 512 bytes, which
is only enough for a few in-flight exceptions. In contrast, libstdc++
reserves ~73 KiB for this purpose.
Add a LIBCXXABI_FALLBACK_MALLOC_HEAP_SIZE CMake option (default 512)
so that builds targeting programs that need to survive OOM conditions
can increase the pool size. Validate the value at both CMake configure
time and C++ compile time, since the heap's unsigned short offsets
limit it to ~256 KiB.
Use setExprNeedsCleanups in BuildCXXNew and avoid breaking c++98
This approach is much cleaner, but broke checkICE reporting in c++98.
Stepping through a debugger shows that this happend because the
static_assert test didn not recognize ExprWithCleanups as transparent to
constant evaluation. To addresse this, we update CheckICE to recurse
into the sub-expression, and keep the old behavior.
[clang] Use uniform lifetime bounds under exceptions
To do this we have to slightly modify how some expressions are handled
in Sema. Principally, we need to ensure that calls to new for
non-trivial types still have their destructors run. Generally this isn't
an issue, since these just get sunk into the surrounding scope. With
more lifetime annotations being produced for the expressions, we found
that some calls to `new` in an unreachable switch arm would not be
wrapped in ExprWithCleanups. As a result, they remain on the EhStack
when processing the default label, and since the dead arm doesn't
dominate the default label, we can end up with a case where the def-use
chain is broken (e.g. the def doesn't dominate all uses). Technically
this path would be impossible to reach due to the active bit, but it
still failed to satisfy a dominance relationship.
With that in place, we can remove the constraint on only using tighter
lifetimes when exceptions are disabled.
[clang] Use tighter lifetime bounds for C temporary arguments
In C, consecutive statements in the same scope are under
CompoundStmt/CallExpr, while in C++ they typically fall under
CompoundStmt/ExprWithCleanup. This leads to different behavior with
respect to where pushFullExprCleanUp inserts the lifetime end markers
(e.g., at the end of scope).
For these cases, we can track and insert the lifetime end markers right
after the call completes. Allowing the stack space to be reused
immediately. This partially addresses #109204 and #43598 for improving
stack usage.