[AArch64][SVE] Handle multi-vector load/store opcodes in frame-index elimination
Lowering a wide scalable load from a stack object produces an
LD1*_{2Z,4Z}_IMM[_PSEUDO] with a frame-index base. getMemOpInfo() and getLoadStoreImmIdx()
had no entries for these SME2/SVE2p1 multi-vector opcodes, so PEI crasheds.
[clang-format] Recognize Verilog class item qualifiers (#199085)
old
```SystemVerilog
class Packet
extern protected virtual function int send
(int value);
endclass : Packet
```
new
```SystemVerilog
class Packet
extern protected virtual function int send
(int value);
endclass : Packet
```
[3 lines not shown]
[clang-format] Remove the blank line in the function try block (#199086)
old with config `{SeparateDefinitionBlocks: Always}`
```C++
void foo() try {
// do something
} catch (const std::exception &e) {
// handle exception
}
```
new
```C++
void foo() try {
// do something
} catch (const std::exception &e) {
[7 lines not shown]
[ARM] Fix some fp16 Shuffle lowering without +fullfp16 (#200688)
Without fullfp16 f16 is not a legal type, meaning we need to be careful
with
how we legalize shuffle vector and buildvector operations that cannot be
treated more optimially using shuffles.
[AMDGPU] Use v_rsq_f32 for f16 rsqrt on targets without 16-bit insts (#200646)
On gfx6/gfx7 the f16 1.0/sqrt(x) pattern was not folded to a reciprocal
square root because performFDivCombine bailed out whenever f16 fsqrt was
not a legal operation. f16 fsqrt is Custom (promoted) on these targets,
so the combine never fired and the full f32 fdiv expansion was emitted.
Split the legality check: when same-type fsqrt is legal (gfx8+), keep
emitting the native rsq. For f16 without a legal fsqrt, compute the
reciprocal square root in f32 with v_rsq_f32 and round back. This is
accurate enough for f16, and needs no denormal scaling because every f16
value extends to a normal f32 and an f16 rsq result is never denormal.
bf16 is intentionally left expanded: it shares f32's exponent range, so
bf16 denormals would extend to f32 denormals that v_rsq_f32 does not
handle.
Fixes #76948
Co-authored-by: Claude Opus 4.8 <noreply at anthropic.com>
[NFC][LLVM] Fix Intrinsics.td to adhere to 80 col limit (#199346)
Verified that there is no difference in the tablegen generated files for
intrinsics except line number changes in the comments in
IntrinsicEnums.inc.
[NFC][LLVM] Remove redundant verifier type checks for some intrinsics (#200658)
Remove the following redundant type checks:
* `[s|u]div_fix*` intrinsics, existing checks in `isSignatureValid` will
verify that arg0 and arg1 are int or int vectors (since they use
`llvm_anyint_ty`) and arg2 is declared as i32, so checks related to it
are also redundant.
* For `lrint` family, the result is `llvm_anyfloat_ty` and the argument
is `llvm_anyint_ty`, so one of the checks is redundant.
[TailCallElim] Drop poison-generating flags on reassociated accumulators (#200624)
For example if you have recursion like
int prod(n) {
if (n == 0) return 1;
return prod(n-1) * f(n)
}
then logically this computes (((f(1) * f(2)) * f(3)) * f(4)) * ... f(n).
But TailCallElim reassociates this, computing instead
((f(n) * f(n-1)) * f(n-2)) * ...
If the operator (* in this case) had poison-generating flags like
nsw, those may not still apply after reassociation. (For example,
suppose in this example f(1) returns 0 -- in that case the original
multiplication cannot overflow, but the new one still might.)
Fix this by clearing the poison-generating flags after reassociating.
[TableGen] Add !switch operator (#199659)
This patch add a syntactic sugar operator to TableGen named `!switch`,
to simplify use cases where a user needs to conditionally use a value
based on exact key match. It supports variadic case arguments (0 or
more). It requires a default value - which creates a stricter grammar
that is simpler to parse, and I think the flexibility cost is not real -
it is considered a best practice in SW design for switch expressions (or
statements) on arbitrary types to always provide a default.
At parse time, after key and value type-checking, we reduce the
`!switch` expression to `!cond`, as they effectively entirely share the
downstream logic. The impl also extracts a shared pre-reduction
type-checking for `!switch` and `!cond` called
`TGParser::resolveInitTypes`.
Motivation: switch-behaving `!cond` value selection in `llvm/lib/Target`
e.g. from `llvm/lib/Target/AArch64/AArch64InstrFormats.td`:
```
[11 lines not shown]
[clang-tidy] `use-ranges`: preserve iterator results with `.begin()` (#196036)
Preserve used iterator results for `remove`, `partition`,
`stable_partition`, and `rotate`-style replacements by appending
`.begin()` where the ranges algorithm returns a subrange.
Fix #124794
Assisted by Codex.
[lit] Add --check to run only selected RUN lines from a test
`llvm-lit --check=LIST <test>` keeps only the listed RUN directives in
the test and discards the rest. LIST is a comma-separated mix
of 0-indexed integers and ranges (e.g. `--check=0,2,4-6`). The
selection is applied to the parseIntegratedTestScript output.
Run tests via
`llvm-lit --check=0 llvm/utils/lit/tests/Inputs/check-filter/sample.ll`,
`llvm-lit --check=1 llvm/utils/lit/tests/Inputs/check-filter/sample.ll`,
`llvm/utils/lit/lit.py llvm/utils/lit/tests/check-filter.py`.
[clang-tidy] Avoid unsafe `use-default-member-init` fixes (#191607)
Suppress `modernize-use-default-member-init` diagnostics when moving a
constructor initializer into a default member initializer would
reference a declaration not visible from the field declaration.
Add `IgnoreNonVisibleReferences` to allow preserving the warning without
emitting unsafe fix-its, and document the new behavior.
Fixes #156412
Assisted by Codex
[clang-tidy] `use-ranges`: avoid unsafe result fix-its
Preserve callable results with .fun, allow structured-binding-safe rewrites, and keep diagnostics while suppressing unsafe fix-its when ranges result objects do not match the original result shape.
Assisted by Codex.
[clang-tidy] `use-ranges`: preserve output results
Preserve used output iterator results for output algorithm replacements by appending .out where the ranges algorithm returns an algorithm result object.
Fix #110223
Assisted by Codex.
[clang-tidy] `use-ranges`: preserve remove iterator results
Preserve used iterator results for remove, partition, stable_partition, and rotate-style replacements by appending .begin() where the ranges algorithm returns a subrange.
Fix #124794
Assisted by Codex.
[clang-tidy] `use-ranges`: preserve used `unique` results (#196035)
Preserve iterator uses when replacing `std::unique` with
`std::ranges::unique` by appending `.begin()` in used-result contexts.
Fix #127658
Assisted by Codex.
[AArch64] Use ADDP tree for v16i8 to i16 bitmask extraction (#199812)
Re-land of #192974, reverted in 868aefd.
The original PR was reverted because the new lowering produced an
EXTRACT_VECTOR_ELT with an illegal i16 result type, which tripped the
operation legalizer when called from combineBoolVectorAndTruncateStore
on a `<32 x i1>` store split into two `<16 x i1>` halves. Returning i32
(handled by the caller's existing getZExtOrTrunc) avoids this.
Regression test added: bitmask_v32i8_split in
vec-combine-compare-to-bitmask.ll.
Note: in alias_mask.ll's whilewr_8_split2, the four halfword bitmask
results are now stored as separate `str h` × 4 rather than packed into
a d-register via ZIP1+EXT before a single store. Functionally
equivalent, slightly fewer NEON arithmetic ops. Side effect of the i32
return type; the store-merging combine doesn't match the same shape.