[libc++] Use _LIBCPP_NO_UNIQUE_ADDRESS for the new vector layout (#207149)
We use `_LIBCPP_NO_UNIQUE_ADDRESS`, since a plain
`[[no_unique_address]]` doesn't work on Windows.
[LegalizeType] Fix VECTOR_DEINTERLEAVE widening with incorrect insert_subvector (#207245)
Partially address #207136
There are really two parts in the associated issue: (1) incorrect type
widening logics that `insert_subvector` with indices that are not a
multiple of the sub-vector's minimum number of elements, and (2)
incorrect RISC-V lowering logics when it comes to fixed vector.
This PR addresses the first part: It turns out in order to have a
widened, packed concat vector, we don't need to use any insert_subvector
that involves widened operands -- just `concat_vectors` on the
_original_ (narrow) operands (before adjusting to the size of the
desired widened concat vector)
[AArch64] Lower cttz(bitcast <Nxi1> to iN) with shrn-based compressed movemask (#199081)
The existing lowering in vectorToScalarBitmask() creates a 1 bit per
lane movemask using a powers of 2 reduction (and+addv with a constant
pool entry).
This patch adds a DAG combine on ISD::CTTZ that recognizes cttz(bitcast
<N x i1> to iN) and produces a compressed movemask with shrn (for i8
lanes) or xtn (for wider lanes) then runs scalar cttz on a 64- or
128-bit value. Dividing by bits per lane gives the lane index.
Supports lane counts {2, 4, 8, 16, 32} (one or two NEON registers)
For the example in the issue (`<16 x i8> -> i16`):
Before:
```asm
adrp x8, .LCPI0_0
cmlt v0.16b, v0.16b, #0
[34 lines not shown]
[PGO][HIP][NFC] Fix hipModuleGetGlobal -Wunused-function warning (#207293)
The functions trigger the warning on Windows (without elf.h) and is
fatal under -Werror.
Fix by adding [[maybe_unused]]. Alternatively it could be moved inside
the existing __has_include(<elf.h>) block,; however that would trigger
-Wunused-but-set-global on pHipModuleGetGlobal.
Current fix is minimal and can be removed once hipModuleGetGlobal is
supported without elf.h.
[clang] fix redeclarations of the injected class name
The declaration used to represent an injected class name should never
be part of any redeclaration chain.
This is a regression since Clang 22, and this will be backported, so no release notes.
Fixes #202320
[clangd] Invalidate preamble when new module imports are added (#199460)
When using `SkipPreambleBuild`, adding a new `import` statement to a
file did
not invalidate the existing preamble because `isPreambleCompatible` only
checked whether existing prerequisite modules were up-to-date, not
whether
the set of required modules itself had changed.
Fixes: #199389
Partially addresses: #126350
[flang-rt] Use posix_memalign instead of std::aligned_alloc (#207248)
MallocWrapper called std::aligned_alloc for over-aligned requests, but
that C11 function is only available on macOS 10.15 and newer. flang-rt
builds with a Darwin deployment target of 10.7 (set in
AddFlangRT.cmake), so the build failed under
-Werror=unguarded-availability-new.
Use posix_memalign instead, as it is available on all supported POSIX
targets.
Fix CODEOWNERS error, remove Lanza from ClangIR owners
The github project reports:
Unknown owner on line 39: make sure <name> exists and has write access to the repository
I assume Nathan's commit access lapsed and he has the `triage` role now.
I added a comment saying he is an emeritus owner. This is reversible,
and I assume if he needs or wants write access, we can revisit this in
the future.
[docs] Rename LangRef.{rst|md}
Tracking issue: #201242
This commit does not use valid markdown, so the docs will not build, but they will be fixed in an immediate follow-up commit that does the migration.
[Offload] Guard __llvm_write_custom_profile null check on non-Windows (#207170)
On Windows __llvm_write_custom_profile is defined as a strong stub (MSVC
lacks proper weak symbol support) by 09a51b2818e2, so its address is a
compile-time constant that is never null. The `if
(!__llvm_write_custom_profile)` check therefore triggers
-Wpointer-bool-conversion, which is fatal under -Werror.
Assisted-by: Claude
[Clang] Fix offsetof sign-extending unsigned array indices >= 128 (#204139)
When evaluating __builtin_offsetof with an unsigned integer array index
(e.g. uint8_t, uint16_t) whose value has the high bit set, Clang was
calling getSExtValue() on the APSInt index, which sign-extends the value
and produces a large bogus offset.
Fix this to use the correct kind of extension to extend smaller values, and to check for overflow in conversions of larger values.
Fixes #199319
AI Tool Use: GitHub Copilot (Claude Sonnet 4.6) was used to assist in
identifying the root cause of the bug in ExprConstant.cpp and drafting
the fix. The fix was reviewed, tested, and validated manually.
[BOLT] Stop materializing .dwo DIE vectors early in the pipeline
Summary: preprocessDWODebugInfo() eagerly force-extracted every .dwo
compile unit's DIE tree (getNonSkeletonUnitDIE(false)) very early in
BOLT pipeline, way before DWARFRewriter kicked in. Those vectors then
sit in memory throughout the entire rewrite pipeline, directly
contributing to BOLT's RSS peak. I did a fair amount of digging and
didn't find any reason as to why we need to keep all DIEs of DWO CU
materialized at all, since DWARFRewriter won't even read this vector
(the #197359 concurrency fix did use that, but that is unnecessary).
The problem is that these DIE trees are a massive contribution to RSS
when processing large binaries where we have 10s of K of dwos, storing
complete trees for each processed dwo.
This diff changes the #197359 concurrency fix to not rely on the DIE
sibling/children structure. It parses DWP type units selectively per
compile unit (DIEBuilder::buildDWPTypeUnitsForUnit ->
collectReferencedTypeSignatures) by finding the DW_FORM_ref_sig8
references in a unit's DIEs to decide which type units belong in that
[23 lines not shown]
[BOLT] Fix shifted DWARF inline-scope ranges; track scope boundaries
Summary:
BOLT updated DWARF lexical-scope ranges (DW_TAG_inlined_subroutine /
lexical_block low_pc/high_pc and DW_AT_ranges) via
translateInputToOutputRange(), which mapped a boundary using its input
offset relative to the start of the containing basic block:
OutAddr = BB.getOutputAddressRange().first + (InputOffset - BB.getOffset())
This assumes intra-block byte offsets are preserved input->output. Any
pass that changes instruction sizes within a block ahead of a scope
boundary breaks that assumption. With --plt=all, each `call foo at PLT`
(5 bytes, e8+rel32) is rewritten to `call *foo at GOT(%rip)` (6 bytes,
ff 15+rel32); N such calls before a boundary shift its emitted low_pc/
high_pc N bytes too early, onto the preceding instruction. The range
stays within the parent so `llvm-dwarfdump --verify` does not catch it;
symbolizers then attribute samples on those instructions to the wrong
inlined frames.
[43 lines not shown]
[VPlan] Strip early-bail in noalias-check (#203936)
canHoistOrSinkWithNoAliasCheck currently bails eagerly when the
candidate memory location doesn't have a scope. This is unnecessary,
because the alias check automatically handles this: stripping this check
allows us to run the loop, which would never get to the alias check if
none of the recipes write to memory. The end result is that a read-only
FirstBB to LastBB ranges are determined not to alias with anything, even
if the scope metadata is absent, leading to licm-load-store
improvements.