[orc-rt] Tidy up some SPS tag types. NFC. (#205038)
Replaces class definitions with decls for tag types that don't need a
body, and moves the SPSError tag down to just above it's
serialization-traits class.
[LoongArch] Custom scalar UINT_TO_FP and FP_TO_UINT with LSX instructions (#200901)
Using `vftintrz.lu.d` for converting scalar double/float values to
unsigned 64-bit integers, and `vffint.d.lu` vice versa.
[AMDGPU] Improve the description of asyncmark semantics (#202579)
- The semantics of asyncmarks is now defined purely in terms of
sequences, without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording
in terms of asyncmark sequences.
[ProfileData] Lazy-load fixed-length MD5 name table (#202014)
When reading extensible binary format profiles with fixed-length MD5
name tables, the reader eagerly allocates and populates a
std::vector<FunctionId> to store the name table. This eager loading
is particularly wasteful when ProfileIsCS is false, as we populate the
entire name table just to support lookups during profile ingestion,
even though we may only use a subset of the profile. Since FunctionId
is 16 bytes on 64-bit systems, a name table containing 10 million MD5
hash values would consume 160MB of heap memory.
This patch implements lazy loading for the name table in extensible
binary format profiles when the fixed-length MD5 layout is used.
Specifically, this patch introduces SampleProfileNameTable to
encapsulate the name table representation, supporting both lazy
loading (pointing directly to the memory-mapped buffer) and eager
loading (using a vector). Eager loading is retained as a fallback for
layouts that do not support O(1) random access (such as
[11 lines not shown]
[FIR] Route embox+projected slice through shapeVec in FIRToMemRef
The descriptor-strides path iterates source-rank dims but queries the
rank-reduced embox result box, miscompiling slices that collapse dims
(e.g. complex %re/%im on b(:,k)). For embox-derived boxes the underlying
storage is contiguous, so the shape-derived layout is both correct and
the natural place to encode "static shape information is available."
Drop the `|| hasProjectedSlice` carve-out from boxNeedsDescriptorStrides
so projection cases also take the shapeVec path. Non-embox boxes
(rebox, assumed-shape) still go through fir.box_dims because their
storage may be non-contiguous.
Fixes the SIGSEGV at -O0 -lro and miscompile at -O1 -lro on the Fujitsu
0086_0019 reproducer (complex(:,k)%re inside WHERE).
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
[orc-rt] Rename scope_exit header, add nodiscard attribute. (#205030)
The rename brings the scope_exit type's header name into alignment with
other ORC runtime snake_case types.
The [[nodiscard]] attribute should help to prevent accidental misuse of
the type.
[clang] Avoid assertion on invalid member template specialization (#201506)
fixes #201490
It would be possible to have `PrevClassTemplate == false` when `SS` was
invalid.
Since it is already invalid, it would be safe to skip
`setMemberSpecialization` for `NewTemplate`. When the qualified scope
specifier is invalid, Sema may have already diagnosed the declaration
and marked it invalid. In that case there may be no previous class
template declaration, so the assertion is too strong. Avoid marking the
new declaration as a member specialization unless the previous class
template exists.
[llvm][RISCV] Revise xsfmm intrinsic interface. (#201527)
This patch does 2 things:
1. Change matmul interface to use newly defined OFP8 RVV types.
2. change all of matmul overloaded interfaces to only keep only widen
information and eliminate types information.
[clang][RISCV] Handle VLS CC on unsupported primitive type in aggregate type (#203898)
We handled this for pure vector type before but missed the aggregate
types, this patch try to apply same mechanism on them where unsupported
vector types are converted to same size i8 vector types.
[lld-macho] Relax safe ICF's keepUnique for ld64-coalesced data sections (#193125)
#188400 regressed data-section folding under --icf=safe{,_thunks}:
no-addrsig fallback, and over-broad compiler-emitted addrsig entries
covering data symbols, both caused markSymAsAddrSig to set keepUnique on
data sections, after which foldIdenticalSections refused to fold them.
ld64 coalesces __cfstring, __objc_classrefs and __objc_selrefs
unconditionally regardless of addrsig, so ignore keepUnique for them as
a workaround for the imprecise addrsig payload.
[orc-rt]R Align scope-exit with LLVM (rename to scope_exit, use CTAD) (#205020)
This renames the orc_rt::detail::ScopeExitRunner class to
orc_rt::scope_exit and adds a class template argument deduction guide.
[AtomicExpand] Add bitcasts when expanding store atomic vector (#197862)
AtomicExpand fails for aligned `store atomic <n x T>` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling.
Store-side counterpart to #148900. Stacked on top of
https://github.com/llvm/llvm-project/pull/201566.
[orc-rt] Add InProcessControllerAccess class. (#204976)
Adds a Session::ControllerAccess implementation for in-process JIT
setups, where the controller (LLVM-side) and the executor (orc-rt) live
in the same address space.
The two sides communicate through a refcounted C-ABI struct (Connection)
of function pointers. The C-only interface avoids assuming a common C++
ABI between the two sides and supports symmetric, graceful disconnect:
when either side calls Connection::Disconnect, in-flight cross-calls are
drained and pending continuations are surfaced as out-of-band errors,
after which further cross-calls fail cleanly.
This is intended to be paired with a new ExecutorProcessControl
implementation (llvm::orc::InProcessEPC) on the LLVM side, landing in a
follow-up commit. Unit tests are included covering construction without
connect, attach via Session, OnConnect-failure detach, successful and
out-of-band-error call cases, and the disconnect-drains-pending
behavior.
[libc][math] Extend iscanonical macro to _Float16 and float128
iscanonical is a C23 type-generic macro, so the f16/f128 variants are
surfaced through it rather than as functions in the generated math.h.
float128 is only listed when distinct from long double (LDBL_MANT_DIG !=
113) to avoid two _Generic associations with compatible types.
[libc][math] Fix aarch64 Darwin fenv implementation for full builds
A full build replaces the system (Apple) <fenv.h> with libc's headers, so
fenv_darwin_impl.h no longer found an 8-byte fenv_t, FE_FLUSHTOZERO, or the
__fpcr_* masks it relied on. Size FPState to the fenv_t in scope, alias
FE_FLUSHTOZERO to FE_DENORM, and define the FPCR trap masks locally.
[VPlan] Use pattern matching in isUsedByLoadStoreAddress (NFC) (#205008)
Replace the hand-written check for a VPReplicateRecipe load/store using
the value as its address with VPlan pattern matching via
m_Unary/m_Binary, which also handle masked recipes uniformly.
[VPlan] Add VPReplicateRecipe::getNumOperandsWithoutMask (NFC) (#205004)
Add a getNumOperandsWithoutMask helper to VPReplicateRecipe, mirroring
the existing VPInstruction::getNumOperandsWithoutMask, and use it to
replace some hand-rolled code.
[libc][math] Extend iscanonical macro to _Float16 and float128
iscanonical is a C23 type-generic macro, so the f16/f128 variants are
surfaced through it rather than as functions in the generated math.h.
float128 is only listed when distinct from long double (LDBL_MANT_DIG !=
113) to avoid two _Generic associations with compatible types.
[flang][OpenMP] Move unique clauses to allowedOnceClauses in OMP.td
Many unique clauses were listed in "allowedClauses", which turned off
the single-occurrence check in flang. Move these clauses to the right
category to enable this check.
One exception to this is the IF clause: the IF clause is unique for
all non-compound directives, but is repeatable on compound ones with
the restriction that at most one IF clause can apply to any of the
constituents. This restriction is currently not enforced correctly
in flang, and so the IF clause was left unchanged.
Although this change is applied to a file shared between flang and
clang, clang does not use these categories for its checks, and hence
is not affected by this patch.
Revert "[Legalizer] Add support for promoting integers for s/ucmp (#198554) (#204978)
This reverts commit 91edd87a801fc5c9d12c7f5c6863edd50327cef8.
It was causing CI failures for Linux.
[ARM] Use lo tCMPr opcode when expanding CMP_SWAP (#204567)
We were always generating the tCMPhir even when the registers were both
low, which is an unpredictable instruction. Generating tCMPr instead
when both the registers are low.
Fixes #204519.