[LV] Handle chained selects/blends when creating new rdx chain. (#199443)
Make sure we recursively clone chains of selects/blends when re-creating
a reduction chain with new types.
Fixes https://github.com/llvm/llvm-project/issues/199406.
[NFC][AArch64][Cyclone] Model WriteSTP with a local SchedWriteRes (#198844)
Cyclone scheduling model uses SchedAlias between 2 SchedWriteRes
definitions from AArch64Schedule.td.
This prevents other scheduling models from aliasing WriteSTP. This patch
address the issue by defining a new CyWriteSTP and using that instead.
[VPlan] Simplify VPSCEVExpander, clarify naming/comments (NFC). (#199423)
Address post-commit comments from
https://github.com/llvm/llvm-project/pull/189455,
removing unneeded member, and clarify naming/comments to stress the
current logic tries to expand a SCEV to VPInstructions, with only a small
sub-set of SCEV expression supported.
[SLP] Ensure TreeCost is scaled for ordered fadd reductions (#199388)
Resolves #199267
Addresses an issue where `getScaleToLoopIterations()` can return 1 on
isolated SLP trees because `UserTreeIndex` is invalid. This prevents
`TreeCost` from scaling alongside `ReductionCost`, causing the cost
model to incorrectly treat an unprofitable vector reduction as
profitable.
This patch passes the reduction root instruction down into
`calculateTreeCostAndTrimNonProfitable` and the underlying scale
calculation so `getScaleToLoopIterations` can get the correct block
context.
[CodeGenPrepare] Report an error if ProfileSummaryAnalysis is not available (#199268)
CodeGenPreparePass can't declare ProfileSummaryAnalysis as required,
because PSA is a module-level analysis, but CFP is a function-level pass.
Therefore it accesses PSA using getCachedResult, and PSA might be null.
In practice this doesn't happen, because the CGP pass pipeline
preparation code ensures that PSA is present. But if you invoke
CGP via opt -passes=codegenprepare, then it's not
there, and we segfault.
Fix for https://github.com/llvm/llvm-project/issues/173360.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[llvm,clang] Don't assume non-erased DenseMap entries remain valid after erase. NFC (#198982)
In preparation for switching DenseMap from tombstone deletion to
backward-shift deletion, update call sites that reuse an iterator or a
bucket reference after erasing another entry from the same map.
These work under tombstone deletion because unrelated buckets stay put,
but backward-shift deletion relocates entries to close the gap.
Add DenseMap::remove_if, similar to SmallPtrSet::remove_if, as
replacement for erase-while-iterating, and use it where applicable.
Aided by Claude Opus 4.7
[LifetimeSafety] Extend suggestions for `lifetimebound` to also warn on canonical declarations (#198784)
With this patch, we suggest adding the `clang::lifetimebound` attribute
on the canonical declaration and on the earliest redeclaration in each
other file, preserving diagnostics for declarations visible from other
translation units while avoiding duplicate suggestions within the same
file.
Fixes #198624
Fixes #198628
[X86][GISel] Fix carry-in for selectUAddSub. (#199261)
When G_UADDE/G_USUBE was chained off a previous G_UADDE/G_UADDO/
G_USUBE/G_USUBO, selectUAddSub re-materialized EFLAGS.CF from the
previous SETB byte using CMP r, 1. That computes (r - 1) and sets
CF iff r < 1 unsigned, i.e. CF = (r == 0) -- the inverse of the
desired carry. The following ADC/SBB then consumed the wrong CF and
produced an off-by-one upper word; e.g. `add i128 0xFF..FF, 1` under
-global-isel returned hi=0 lo=0 instead of hi=1 lo=0.
Emit NEG r instead: NEG sets CF iff its operand is non-zero, matching
the SETB byte. NEG is a two-address (tied) instruction, so emit it
into a fresh virtual register rather than redefining the carry-in
vreg.
C reproducer (compile on x86_64-linux-gnu and run):
```
// clang -O2 -fglobal-isel repro.c -o repro && ./repro
[32 lines not shown]
[SLP][NFC] Add precommit test for unprofitable ordered fadd reductions (#199428)
Adds a test case reproducing a scenario where the cost model incorrectly
evaluates an unprofitable ordered fadd reduction chain as profitable.
Further details can be found on this issue:
https://github.com/llvm/llvm-project/issues/199267
[libc][math] Implement isnanf16 header-only function (#198115)
Adds `isnanf16` the float16 variant of isnan as part of issue
[#195400](https://github.com/llvm/llvm-project/issues/195400), which
tracks adding missing isnan variants for extended floating-point types.
The implementation follows the same pattern as the existing `isnanf`,
`isnan`, and `isnanl` functions.
---------
Co-authored-by: Victor Campos <github at victorcampos.me>
[VPlan] Simplify block deletion in VPlan dtor (NFC) (#199421)
Split deletion loop into 2 simpler loops: first replace all operands of
each recipe with a dummy value. Then delete blocks in second pass.
This avoids RAUW unnecessarily and also removes the need to handle
region values explicitly.
[libc++] remove duplicate assertions for void/reference const any_cast
For test cases of the const overload of any_cast, such as:
```C++
void test() {
std::any a = 0;
const std::any& a2 = a;
(void)std::any_cast<int&>(&a2);
}
```
(And similarly for void).
The problem is that the assertions are implemented both in the const and non-const any_cast overloads,
but since the const overload delegates to the non-const overload, that ends up producing the same assertion twice.
workflows/issue-release-workflow: Validate user input in /cherry-pick commands (#199249)
This protects against mailicious inputs embedded in comments with
/cherry-pick commands.
[offload] Fix --libomptarget-nvptx-bc-path in tests (#199382)
PR #198622, which landed as 3383f0d6fe01, causes 272 `libomptarget ::
nvptx64-nvidia-cuda` test fails on my system with:
```
clang: error: bitcode library '/home/jdenny/llvm/build/\./lib/x86_64-unknown-linux-gnu/nvptx64-nvidia-cuda' does not exist
```
This patch fixes that.
[MC] Create new MCScheduleOptions cl::opt category (#198746)
This patch creates a new cl::opt category for MCSchedule options. It
enables tools to filter MCSchedule options based on category.
Specifically, llvm-mca now filters them in, and displays them under
`--help-hidden`, which wasnt the case before.
[InstCombine] Fix vector_reduce_mul(sext <n x i1>) for odd n. (#199401)
Before this patch, instcombine folded
vector_reduce_mul(sext (<n x i1> val))
to
zext(vector_reduce_and(<n x i1> val)).
But this is incorrect when n is odd: The result of the reduction is -1,
not 1.
After this patch we only do this fold when n is even.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[ConstantFolding] Handle large exponents in ldexp (#199309)
Previously if you passed a constant exponent to llvm.ldexp greater than
the width of `int`, we would silently truncate it to `int` before
using it in scalbn. We'd thus generate the incorrect result.
We now clamp it to fit within int.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[VPlan] Remove special cost logic for loads predicated by header mask. (#196630)
Remove the special cost logic for loads predicated by the header mask,
as it does not accurately reflect the cost of the generated VPlan.
Unmasking the load can only be done in general if we don't unroll or if
the address is actually uniform-across-vf-and-uf. The former we cannot
really determine before selecting the VF as UF is picked after VF. The
latter is not really useful in practice.
PR: https://github.com/llvm/llvm-project/pull/196630