[ADT] Reinstate "Refactor Bitset to Be More Constexpr-Usable" (#189497)
Reland of #172062 (a71b1d2), which was reverted in b0234d1.
This patch makes essential Bitset member functions constexpr (`set()`,
`any()`, `none()`, `count()`, `operator==`, `!=`, `<`, `\~`) and adds a
new `all()` method. It also introduces a `maskLastWord()` invariant to
ensure unused high bits in the last word are always zero, which is
required for correctness of `operator~`, `set()`, `all()`, and
comparisons on non-word-aligned sizes (e.g., `Bitset<33>`).
Changes from the original reverted PR:
- Replaced `llvm::any_of` with an inline loop to avoid depending on
constexpr `any_of`/`none_of` from `STLExtras` (#172536), which was also
reverted due to a GCC 15.2.1 bootstrap miscompile.
- The patch is now fully self-contained with no prerequisite changes.
Motivation: This is a prerequisite for making `LaneBitmask` a wrapper
around `Bitset`, enabling scalable lane bitmasks beyond 64 bits
(https://discourse.llvm.org/t/rfc-out-of-lanebitmask-bits-again/88613).
[LLD][ELF] Skip non-inputsections to avoid invalid cast in Arm BE8 handling (#188154)
This patch fixes https://github.com/llvm/llvm-project/issues/187033
In BE8 mode, instruction bytes are reversed for sections containing
code. This logic currently assumes that arm mapping symbols (e.g. $a,
$t, $d) are always associated with InputSections.
However, mapping symbols can also be defined in other section types such
as mergeable sections (SHF_MERGE). These are not represented as
InputSection, and attempting to cast them using
cast_if_present<InputSection> results in an assertion failure.
[BOLT][AArch64] Strip uneeded labels from FEAT_CMPBR tests. (#189931)
Eliminates the temporary labels so that BOLT does not recognize them as
secondary entry points.
[MLIR][Linalg] Generic to category specialization for unary elementwise ops (#187217)
Handle specialization of `linalg.generic` ops representing a unary
elementwise computation to the `linalg.elementwise` category op. This
implements a previously absent path in the linalg morphism.
[RISCV][TTI] Update cost and prevent exceed m8 for vector.extract.last.active (#188160)
This patch contains two parts.
1. Update costs reflect to the codegen changes. This is not that
accurate since the step vector can use smaller type if there is a
vscale_range attribute. But we cannot get that in the type-based query
in TTI.
2. Return invalid cost for the vector.extract.last.active that needs
vector split for the step vector. But currently this is not handled
correctly and will hit the assertion.
For not blocking the FindLast reduction in LV
(https://github.com/llvm/llvm-project/pull/184931). We should land this
first and fix the SelectionDAG for vector.extract.last.active lowering.
[AArch64][llvm] Some instructions should be `HINT` aliases (NFC)
Implement the following instructions as a `HINT` alias instead of a
dedicated instruction in separate classes:
* `stshh`
* `stcph`
* `shuh`
* `tsb`
Updated all their helper methods too, and updated the `stshh` pseudo
expansion for the intrinsic to emit `HINT #0x30 | policy`.
Code in AArch64AsmPrinter::emitInstruction identified an initial BTI using a
broad bitmask on the HINT immediate, which also matched shuh/stcph (50..52)
This could move the patchable entry label after a non-BTI instruction.
Replaced it with an exact BTI check using the BTI HINT range (32..63) and
AArch64BTIHint::lookupBTIByEncoding(Imm ^ 32).
A following change will remove duplicated code and simplify.
[2 lines not shown]
[CostModel] Move default expand cost for partial reductions to BasicTTIImpl (#189905)
This is a follow-up of the suggestion left here:
https://github.com/llvm/llvm-project/pull/181707#discussion_r2995733831
The override functions in AMDGPU/ARM/SystemZ/X86 are required to avoid
enabling partial reductions where they were previously disabled (I've
added this for all targets that implement getArithmeticReductionCost).
[AArch64][llvm] Some instructions should be `HINT` aliases (NFC)
Implement the following instructions as a `HINT` alias instead of a
dedicated instruction in separate classes:
* `stshh`
* `stcph`
* `shuh`
* `tsb`
Updated all their helper methods too, and updated the `stshh` pseudo
expansion for the intrinsic to emit `HINT #0x30 | policy`.
Code in AArch64AsmPrinter::emitInstruction identified an initial BTI using a
broad bitmask on the HINT immediate, which also matched shuh/stcph (50..52)
This could move the patchable entry label after a non-BTI instruction.
Replaced it with an exact BTI check using the BTI HINT range (32..63) and
AArch64BTIHint::lookupBTIByEncoding(Imm ^ 32).
A following change will remove duplicated code and simplify.
[2 lines not shown]
[lldb][AArch64][Linux] Qualify uses of user_sve_header (#190130)
Fixes #165413. Where a build failure was reported:
```
/b/s/w/ir/x/w/llvm-llvm-project/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp:1182:9: error: unknown type name 'user_sve_header'; did you mean 'sve::user_sve_header'?
1182 | user_sve_header *header =
| ^~~~~~~~~~~~~~~
| sve::user_sve_header
```
To fix this, add sve:: as we do for all other uses of this.
This is LLDB's copy of a structure that Linux also defines. I think the
build worked on some machines because that version ended up being
included, but with a more isolated build, it may not.
We have our own definition of it so we can be sure what we're using in
case Linux extends it later.
[Clang][LoongArch] Align LSX/LASX built-in signatures with intrinsic types to avoid lax conversions (#189900)
Update the built-in signatures in BuiltinsLoongArchLSX.def and
BuiltinsLoongArchLASX.def to precisely match the vector types used in
the corresponding intrinsic headers (lsxintrin.h and lasxintrin.h).
This alignment ensures that these intrinsics can be compiled
successfully even when -flax-vector-conversions=none is specified, since
the built-in arguments no longer rely on implicit vector type
conversions.
Added new test cases to verify the macro-defined LSX/LASX
intrinsic interfaces under -flax-vector-conversions=none.
Fixes #189898
[clang][analyzer] Forward CTU-import failure conditions
Forward all CTU-import failures as diagnostics (remarks, warnings,
errors), except for `index_error_code::missing_definition` which has the
potential of generating too many diagnostics.
--
CPP-7804
Move ExpandMemCmp and MergeIcmp to the middle end (#77370)
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.
This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[libc] Indentation consistency in CMake (#190120)
This PR just fixes the indentation/style for the whole CMake file for
consistency.
No other changes.
c698f55b0245ffbaae55c7f854fadba33df16e9d
Reland "[CoroSplit] Erase trivially dead allocas after spilling (#189295)" (#190124)
The original PR contained a use-after-delete issue, which has been
resolved in #189521.
Reland #189295, which is reverted in #189311
[Passes][LoopRotate] Move minsize handling fully into pass (#189956)
Make this dependent only on the minsize attribute and drop the pipeline
handling.
Rename the enable-loop-header-duplication option to
enable-loop-header-duplication-at-minsize to clarify that it controls
header duplication at minsize only (in other cases it is enabled by
default, independently of this option).
[Passes][FuncSpec] Move optsize/minsize handling into pass (#189952)
Instead of using the Os/Oz level during pass pipeline construction,
query the optsize/minsize attribute on the function to determine whether
specialization is allowed to take place. This ensures consistent
behavior for per-function attributes.
It's worth noting that FuncSpec *already* checks for minsize, but at the
call-site level.
WholeProgramDevirt: Import/export the CVP byte directly in the summary (#188979)
rather than using absolute symbol constants on ELF/x86.
This leads to better codegen as the absolute symbol constants were not
resolved until link time (see bug for example).
Fixes #188470
[RISCV] Fix stackmap shadow trimming NOP size for compressed targets (#189774)
The shadow trimming loop in LowerSTACKMAP hardcoded a 4-byte decrement
per instruction, but when Zca is enabled NOPs are 2 bytes. Use NOPBytes
instead of the hardcoded 4 so the shadow is correctly trimmed on
compressed targets.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
(cherry picked from commit 3d7eedce5658c41a1b22775938359bfafac47fc9)
[flang] Update Flang Extension doc to reflect previous change (#188088)
Update Flang Extension doc to remove note about a warning that was
removed in a previous PR (PR #178088). It is an oversight that this doc
change was not made in that previous PR. The oversight was only recently
discovered and has led to this PR.
(cherry picked from commit 45b932a2d452c997d98b57e1aa31bc4951c5e9f4)