[Passes][LoopRotate] Move minsize handling fully into pass (#189956)
Make this dependent only on the minsize attribute and drop the pipeline
handling.
Rename the enable-loop-header-duplication option to
enable-loop-header-duplication-at-minsize to clarify that it controls
header duplication at minsize only (in other cases it is enabled by
default, independently of this option).
[Passes][FuncSpec] Move optsize/minsize handling into pass (#189952)
Instead of using the Os/Oz level during pass pipeline construction,
query the optsize/minsize attribute on the function to determine whether
specialization is allowed to take place. This ensures consistent
behavior for per-function attributes.
It's worth noting that FuncSpec *already* checks for minsize, but at the
call-site level.
WholeProgramDevirt: Import/export the CVP byte directly in the summary (#188979)
rather than using absolute symbol constants on ELF/x86.
This leads to better codegen as the absolute symbol constants were not
resolved until link time (see bug for example).
Fixes #188470
[RISCV] Fix stackmap shadow trimming NOP size for compressed targets (#189774)
The shadow trimming loop in LowerSTACKMAP hardcoded a 4-byte decrement
per instruction, but when Zca is enabled NOPs are 2 bytes. Use NOPBytes
instead of the hardcoded 4 so the shadow is correctly trimmed on
compressed targets.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
(cherry picked from commit 3d7eedce5658c41a1b22775938359bfafac47fc9)
[flang] Update Flang Extension doc to reflect previous change (#188088)
Update Flang Extension doc to remove note about a warning that was
removed in a previous PR (PR #178088). It is an oversight that this doc
change was not made in that previous PR. The oversight was only recently
discovered and has led to this PR.
(cherry picked from commit 45b932a2d452c997d98b57e1aa31bc4951c5e9f4)
[ELF] Parallelize --gc-sections mark phase (#189321)
Add `markParallel` using level-synchronized `parallelFor`. Each BFS
level is processed in parallel; newly discovered sections are collected
in per-thread queues and merged for the next level.
The parallel path is used when `!TrackWhyLive && partitions.size()==1`.
`parallelFor` naturally degrades to serial when `--threads=1`.
Uses depth-limited inline recursion (depth<3) and optimistic
load-then-exchange dedup for best performance.
Linking a Release+Asserts clang (--gc-sections, --time-trace) on an old
x86-64:
8 threads: markLive 315ms -> 82ms (-234ms). Total 1562ms -> 1350ms
(1.16x).
16 threads: markLive 199ms -> 50ms (-149ms). Total 1017ms -> 862ms
(1.18x).
[2 lines not shown]
[AArch64][GISel] Widen non-power2 element sizes for ctlz. (#189371)
This addresses an illegal mutation kind, where gisel would hit an
assert. It expands vector elements for non-power2 elements or elements
less that i8 to a power of 2.
A fix to handle vector types correctly was needed in LegalizerHandler.
Fixes #185411
[ELF] Move Symbol::used to atomic flags field (#190117)
Move the `used` bitfield into the existing `std::atomic<uint16_t>
flags`,
making it safe for concurrent access from parallel GC mark (#189321).
[clang-doc] Update lookup routines for consistency (#190043)
When filtering is enabled, its possible an Info doesn't have a
Parent USR. Use `find()` to safely handle that case.
Additionally, I noticed the comparison code for the index
poorly reimplemented the existing comparison from StringRef.
We can just use the one from ADT.