[clang-tidy] Fix alphabetical order check for multiline doc entries and whitespace handling (#186950)
The `check_alphabetical_order.py` script previously only scanned the
first line of each bullet point in `ReleaseNotes.rst`, causing sorting
failures when a `:doc:` tag was split across multiple lines.
Also, when it is sorting the last entry of a section, the script will
insert an unnecessary whitespace.
This PR fixes these two problems.
[LoopRotate] Use SCEV exit counts to improve rotation profitability (#187483)
Most loop transformations, like unrolling and vectorization, expect the
latch branch to be countable. Allow rotation, if it turns the latch from
uncountable to countable.
This use SCEV to check for countable exits, if CheckExitCount set.
Currently it is not set for the LPM1 run (where SCEV is not used by
other passes), only in LPM.
With that compile-time impact is mostly neutral
https://llvm-compile-time-tracker.com/compare.php?from=eba342d0ba930a404a026c80aada51c43974f0db&to=2e676337b45fae63ce9498116d8e6e43772363c5&stat=instructions:u
ClamAV is consistently slower (~+0.15%) and 7zip faster in most cases
(~-0.13%)
Across a large test set based on C/C++ workloads, this rotates ~0.8%
more loops with ~2.68M rotated loops.
[16 lines not shown]
[SPIR-V] Support global variable annotations in llvm.global.annotations (#187241)
SPIR-V backend previously only supported function annotations in
llvm.global.annotations and crashed with a fatal error when encountering
global variable entries
[AMDGPU] Introduce ASYNC_CNT on GFX1250 (#185810)
Async operations transfer data between global memory and LDS. Their
progress is tracked by the ASYNC_CNT counter on GFX1250 and later
architectures. This change introduces the representation of that counter
in SIInsertWaitCnts. For now, the programmer must manually insert
s_wait_asyncnt instructions. Later changes will add compiler assistance
for generating the waits by including this counter in the asyncmark
instructions.
Assisted-by: Claude Sonnet 4.5
This is part of a stack:
- #185813
- #185810
[AArch64][GlobalISel] Remove fallback for scalar usqadd/suqadd intrinsics (#187513)
Previously, GlobalISel was failing to select these intrinsics when given
scalar operands, as RegBankSelect would place these on GPR banks. Fixing
this enables GlobalISel to lower correctly, as in Instruction Selection
the intrinsic matches the SIMD patterns in AArch64InstrInfo.td.
[clang-tidy] Fix "effective" -> "efficient". (#187536)
"Effective" is the wrong word: Both overloads are effective; they do
what they're supposed to do. But the character overload does less work.
[LV] Simplify `matchExtendedReductionOperand()` (NFCI) (#185821)
This updates `matchExtendedReductionOperand` so the simple case of
`UpdateR(PrevValue, ext(...))` is matched first as an early exit. The
binop matching is then flattened to remove the extra layer of the
`MatchExtends` lambda.
Reapply "[clang][bytecode] Allocate local variables in `InterpFrame` … (#187644)
…tail storage" (#187410)
This reverts commit bf1db77fc87ce9d2ca7744565321b09a5d23692f.
Avoid using an `InterpFrame` member after calling its destructor this
time. I hope that was the only problem.
[AMDGPU] Introduce ASYNC_CNT on GFX1250
Async operations transfer data between global memory and LDS. Their progress is
tracked by the ASYNC_CNT counter on GFX1250 and later architectures. This change
introduces the representation of that counter in SIInsertWaitCnts. For now, the
programmer must manually insert s_wait_asyncnt instructions. Later changes will
add compiler assistance for generating the waits by including this counter in
the asyncmark instructions.
Assisted-by: Claude Sonnet 4.5
[X86] Perform i128/i256/i512 BITREVERSE on the FPU (#187502)
Bitcast the large scalar integer to a vXi64 vector, reverse the elements
and then perform a per-element vXi64 bitreverse
If we have SSSE3 or later, BITREVERSE expansion using PSHUFB is always
more efficient than performing it as a scalar sequence (no need for
mayFoldIntoVector check).
Fixes #187353
Windows release build: Add checksum verification for downloaded source archives (#187113)
Add checksum verification for libxml2, zlib, and zstd source archives
via `cmake -E *sum` and `cmake -E compare_files` commands.
This also adds the following minor changes:
* Factor out libxml2 version into variable.
* Check `tar` return code.
[llc] Add -mtune option (#186998)
This patch adds a Clang-compatible -mtune option to llc, to enable
decoupled ISA and microarchitecture targeting, which is especially
important for backend development. For example, it can enable to easily
test a subtarget feature or scheduling model effects on codegen across a
variaty of workloads on the IR corpus benchmark:
https://github.com/dtcxzyw/llvm-codegen-benchmark.
The implementation adds an isolated generic codegen flag, to establish a
base for wider usage - the plan is to add it to `opt` as well in a
followup patch. Then `llc` consumes it, and sets `tune-cpu` attributes
for functions, which are further consumed by the backend.