[clang][bytecode] Get the right definition before compiling functions (#201105)
This broke libc++'s
std/ranges/range.adaptors/range.concat/iterator/arithmetic.pass.cpp.
The (reduced via cvise but not enough) function looks like this:
```c++
friend constexpr unsigned
operator-(const __iterator &__x, const __iterator &__y)
{
(void)-(__y - __x);
return 0;
}
```
When evaluating the binary operator for overflow, we will compile the
operator- (_this_ function) to bytecode. At that point,
::isThisDeclarationADefiniton() will return true and ::getDefiniton()
[7 lines not shown]
[RISCV][P-ext] Add zero/sign extend support between 32-bit and 64-bit vectors. (#201694)
Still need to improve sext on RV64.
Assisted-by: Claude Sonnet 4.6
[X86] Remove shouldCastAtomicLoadInIR; use DAG combine instead
Remove X86's shouldCastAtomicLoadInIR override that cast FP atomic
loads to integer at the IR level. Instead, handle this in a pre-legalize
DAG combine (combineAtomicLoad) that rewrites FP/FP-vector atomic loads
to integer atomic loads plus a bitcast.
This depends on #199310 which adds the necessary cmpxchg support for
non-integer atomic loads in AtomicExpand.
[ExprConstant] Treat `&*p` as not a dereference in C constant initializers (#201483)
In C, [C11 6.5.3.2p3] specifies that when the operand of unary `&` is
the result of a unary `*` operator, neither operator is evaluated and
the result is as if both were omitted. So `&*p` yields the pointer value
`p` without performing a dereference, and forming it is well-defined
even when `p` is null (e.g. `&*(int *)0`).
The constant evaluator did not honor this: it evaluated the `*` as a
real lvalue access and diagnosed a null dereference as undefined
behavior. This went unnoticed for ordinary scalar initializers, which
use the relaxed `Expr::isConstantInitializer()` check, but a bit-field
initializer is evaluated via `EvaluateAsInt()` with `SE_NoSideEffects`,
so the same expression was rejected there with "initializer element is
not a compile-time constant":
```
struct S { long v : 8; };
const struct S s = { .v = (long)&*(int *)0 }; // error
[9 lines not shown]
[RISCV][P-ext] Select scalar asub/asubu and mulhr/mulhru/mulhrsu on RV32 (#201540)
The truncate combine only formed these nodes for packed vectors; extend
it to scalar i32 on RV32 and add the matching isel patterns.
[DenseMap] Store occupancy in a packed used-bit array (#201281)
Track bucket occupancy in a packed 1-bit-per-bucket "used" array (uint32
words)
instead of an `Empty` sentinel key. The buckets and the used array share
one
allocation. The probing scheme is unchanged.
(uint64_t words lead to slightly larger clang binary.)
Because occupancy is a packed bit instead of an in-band sentinel,
probing and
iteration test a dense bit rather than loading each bucket key. This
helps
find-miss and iteration (the empty terminus and the empty buckets become
a bit
test, not a bucket load; for large keys it also skips the structural
compare
against the empty key) and large-bucket insert. It costs find-hit (the
matched
[19 lines not shown]
[lld][WebAssembly] Simplify many-functions.ll test (#201711)
Remove superfluous checks (function bodies, data section, symbol table,
and segment info) from the test.
The primary purpose of this test is to verify that relocations within
the CODE section are handled correctly when linking objects with many
functions (requiring multi-byte LEB128 for function count).
Checking the entire symbol table, segment info, data section, and all
129 function bodies is superfluous and adds unnecessary noise (over 1000
lines of expectations) to the test. These features are covered by other,
more targeted tests. Reducing these checks makes the test much easier to
read and maintain.
Reland HIP offload PGO runtime support as a separate opt-in library (#201606)
This mostly relands the compiler-rt part of #177665 (approved and
merged, then reverted in #201416). The first commit restores it as
merged.
It was reverted because of a Windows problem: the ROCm runtime needs the
sanitizer interception library, which is built /MD on Windows, so
putting it in clang_rt.profile forced that library to /MD and broke
users linking it with the static CRT (/MT).
The second commit fixes this by building the ROCm support as a separate,
opt-in library clang_rt.profile_rocm, a /MD superset of
clang_rt.profile. The base library is left unchanged (/MT, no ROCm). The
driver links clang_rt.profile_rocm first, so it resolves all profile
symbols and the base library stays inert.
clang_rt.profile_rocm is off by default. The compiler-side change and
driver wiring are in a separate PR.
[CIR] Implement Direct+canFlatten in CallConvLowering
CallConvLowering previously ignored the canFlatten flag on Direct
classifications: a Direct arg with a multi-field struct coerced type was
passed as a single struct argument rather than N scalar register arguments.
This is the register-passing pattern the x86-64 SysV ABI uses for structs
like struct { long a, b; }.
A new helper getFlattenedCoercedType centralizes the detection (Direct,
multi-field struct coercedType, canFlatten set). The three lowering sites
are updated: buildNewArgTypes pushes one wire type per field; insertArgCoercion
reassembles the coerced struct from N scalar block args then coerces to the
original type if the two differ; rewriteCallSite extracts each field via
cir.extract_member. The existing coerce-record-to-record-via-memory.cir
test gains can_flatten = false to opt into the single-arg path.
[CIR] Lower byval/byref args in CallConvLowering
ArgKind::Indirect arguments were hitting an errorNYI in
CIRABIRewriteContext. Add the lowering: in the callee the block argument
type changes to !cir.ptr<T>, a load is inserted at entry so the body sees
the original value type, and llvm.byval or llvm.byref is attached based on
ownership. At call sites, both byval and byref are lowered by allocating a
stack slot, copying the value in, and passing the pointer.
For byval, llvm.noalias and llvm.noundef are also added — llvm.noalias
because the call-site rewrite always produces a fresh alloca+store
(equivalent to -fpass-by-value-is-noalias), and llvm.noundef because the
copy is always fully defined. byref carries only llvm.byref and llvm.align
since it does not assert exclusive ownership.
[CIR] Lower sret returns in CallConvLowering
Functions that return an aggregate by value classify their return as
ArgKind::Indirect, but CallConvLowering reached an errorNYI for that
case, so the whole CallConv pass refused to lower any struct-returning
function.
rewriteFunctionDefinition now recognizes an Indirect return: the wire
return type becomes void, a hidden sret pointer is prepended as block
argument 0, and every cir.return is routed through that pointer. Rather
than storing the loaded return value through the sret pointer (a
byte-copy that breaks non-trivially-copyable types -- libstdc++'s SSO
std::string keeps a _M_p pointer into its own _M_local_buf, so a
byte-copy leaves the destination aliasing the source's dying stack
storage), insertSRetStores rewires the __retval alloca to the sret
pointer so construction flows directly into the caller's slot, matching
classic CodeGen's "construct into %agg.result" pattern. CIRGen emits one
cir.load __retval / cir.return pair per return statement, all reading the
single __retval alloca, so the alloca is rewired once and every return is
[18 lines not shown]
[CIR] Implement ArgKind::Expand in CallConvLowering
ArgKind::Expand classifies a struct argument for flattening: each field
becomes a separate scalar argument at the ABI level. Classic CodeGen
calls this "struct expansion" — used on targets like MIPS and some ARM
calling conventions.
CIRABIRewriteContext previously emitted errorNYI at both classification
sites. The replacement covers three call paths. In buildNewArgTypes,
the original struct type is replaced by one wire type per field. In
insertArgCoercion, the single struct block argument is replaced by N
scalar block arguments and an alloca+get_member+store+load sequence at
the entry block reassembles them for body uses; a running block-argument
index (rather than classIdx + sretOffset) correctly tracks the expanded
slot count when multiple Expand args or sret+Expand combinations appear.
The Ignore-drop loop gains a classToBlockArg pre-computation so that
Ignore args following Expand args are erased at the correct index. In
rewriteCallSite, cir.extract_member decomposes the struct operand into
its constituent fields, which become separate call arguments.
[3 lines not shown]
[Webkit Checkers][SaferCpp] Detect base-to-derived downcasts laundered through void* in MemoryUnsafeCastChecker (#200294)
Adds a matcher for static_cast<Derived*>(static_cast<void*>(base)),
which previously evaded detection because the outer cast's immediate
source expression is void*, not Base*.
rdar://173770143
---------
Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
[WebAssembly] Fix crash combining complex numbers and multivalue (#200514)
This fixes a crash in Clang when the `experimental-mv` ABI is used on
WebAssembly targets in conjunction with complex numbers as arguments.
There's no strict definition for what the multivalue ABI is at this
time, so the main goal is to just not crash for now.
Closes #70402
Closes #153567
[libc][rwlock] fix the race condition in waiter queue (#201629)
Fix #201615.
Fix the issue that non atomic operations race in waiting queue, which
causes missed futex wakeup signals.
Confirmed by TSAN:
```
==================
WARNING: ThreadSanitizer: data race (pid=388518)
Write of size 4 at 0x7ffd21cf98e4 by thread T23:
#0 __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads() ./libc/src/__support/threads/raw_rwlock.h:443:44
#1 __llvm_libc_23_0_0_git::RawRwLock::unlock() ./libc/src/__support/threads/raw_rwlock.h:520:5
#2 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:104:18
#3 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5
Previous atomic read of size 4 at 0x7ffd21cf98e4 by thread T4:
[20 lines not shown]
[flang][OpenMP] Separate checks for type-parameter inquiry and subobject (#201324)
This will make it possible to diagnose these situations independently.
This isn't perfect, but will be improved gradually in the future.
[libc++] Remove ios_base::__xindex_ from the ABI (#198994)
`__xindex_` is only ever used from the dylib from a single function. We
can simplify the code a bit by making the variable function-local and
avoiding exposing it to the ABI at all. This also fixes a TODO about
whether it's safe to use `atomic` with the GCC ABI: yes, since it's not
actually part of our ABI.
[RISCV] Clang flags for controlling zilsd alignment (#181439)
Called `-mzilsd-word-align` and `-mzilsd-strict-align`. These interact
with scalar/strict alignment, in hopefully a reasonable way.
They cause errors on rv64, where zilsd is not available.
[CI][AMDGPU] Create scriptedbuilder for libc build (#201687)
Introduced a new scriptedbuilder for libc build. It will enable
developers to conveniently reproduce the same build by our bot:
https://lab.llvm.org/buildbot/#/builders/10
Tested locally, tests passed.
[HLSL][CBuffer][Matrix] Honor row_major/column_major keyword in cbuffer layout (#201671)
fixes #201668
A per-declaration `row_major`/`column_major` keyword on a cbuffer matrix
was being dropped when building the cbuffer layout, so the layout struct
and the buffer-layout copy fell back to the translation-unit
`-fmatrix-memory-layout=`
Needed to fix the desugar in two places:
* HLSLBufferLayoutBuilder::layOutMatrix took a `const ConstantMatrixType
*` and called ConvertTypeForMem(QualType(MT, 0)), discarding the sugar.
It now takes the sugared QualType.
* SemaHLSL's host-layout struct construction called
getUnqualifiedDesugaredType() on each field, erasing the orientation
attribute. A getHostLayoutFieldType() helper now keeps the sugared type
for constant matrices while desugaring everything else.
[dsymutil] Make the Parallel DWARF linker the default (#200971)
This commit toggles the default linker in dsymutil from the classic
linker to the parallel linker. This means that we have parity between
the two implementations, at least for everything we have test coverage
for in LLVM and LLDB.
I expected we'll continue to uncover more differences in the future.
However I don't think that necessitates holding off on toggling the
default. By making the parallel linker the default, we get maximum
living on upstream, even if that audience is comparatively small.
Fixes #195390
[DWARFLinker] Emit .debug_names entries for type-unit DIEs in parallel linker (#201215)
The default tag arm of AcceleratorRecordsSaver::save returned early when
a DIE was cloned into the artificial type unit, so class-static const
data members (DW_AT_const_value, no out-of-class definition) never got
an accelerator entry. As a result `target var A::int_val` in LLDB then
found nothing.
The HasLiveAddress / HasRanges guard already decides whether a DIE
carries enough information of its own to warrant a name record; the
output unit is just doing the routing. Drop the early return and thread
the TypeEntry through saveNameRecord / saveObjCNameRecord / saveObjC so
they emit into the type-unit accel storage when appropriate, the same
way saveTypeRecord and saveNamespaceRecord already do.
Revert "[clang][lex] Store `HeaderFileInfo` in a `DenseMap`" (#201702)
Reverts llvm/llvm-project#200968
This is causing some non-determinism in PCM files in the
`clang/test/Modules/rebuild.m` test.
[RISCV] Support Qualcomm Access Relocations (#188671)
These QUALCOMM vendor relocations mark 16-bit compressed and 32-bit
load/store instructions as candidates for relaxation from a QC_E_LI +
Load/Store sequence.
This change adds support for assembling instructions with these
relocations. These relocations are documented in
https://github.com/quic/riscv-elf-psabi-quic-extensions
Revert "[OpenMP] Use ext linkage for kernels handles and globals handles keep…" (#201698)
Reverts llvm/llvm-project#200964
This patch breaks flang declare target on a common block