[X86] Manage atomic store of fp -> int promotion in DAG (#197166)
When lowering `atomic store <1 x T>` vector types with floats (i.e.
during scalarization in the selection DAG), selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.
Store-side counterpart to #148895. Stacked on top of #197165; and below
of #197618.
Reland [VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (#197659)
Reland of #195119, which was reverted in 2b26355 due to:
1. An assertion failure on AArch64 where
`getShuffleCost(SK_ExtractSubvector)` was called without the `SubTp`
parameter.
2. A miscompilation on non-power-of-2 vector sizes where parity-based
shuffle masks cause lane duplication in the reduction tree.
Fixes:
- Pass `ReduceVecTy` as `SubTp` to `getShuffleCost`.
- Restrict partial reductions to power-of-2 vector sizes.
---
Extend foldShuffleChainsToReduce to recognize partial reduction patterns
where
only a subvector of the full vector is being reduced.
[11 lines not shown]
csh(1): Fix further warnings and bump WARNS from 1 to 6.
* Remove unneeded malloc_usable_size() prototype (there is one
in <stdlib.h>).
* ut_host is a member of struct utmpx, too, so expand the #ifdef for
the prototypes for utmphost() and utmphostsize() accordingly.
[LV] Add -epilogue-tail-folding-policy flag for tail-folded epilogue (#190697)
This is the first patch in a series implementing **tail-folding on the
epilogue loop** — a vectorization style that pairs an unpredicated
vector main loop with a predicated vector epilogue.
It adds a new flag, `-epilogue-tail-folding-policy`, to enable the
style opt-in. Subsequent patches will build out the implementation.
Motivation behind this work:
- The current vectorization styles force either tail-folding on the main
vector loop with no interleaving, or unpredicated main vector loop with
interleaving.
The first style prevents us from getting the benefit of high interleaving
when it’s beneficial/possible, and the second one prevents
tail-folding while it could be beneficial specially for low trip count.
- The proposed hybrid approach of having unpredicated main vector loop
with tail-folded vector epilogue combines the strengths of both styles,
[7 lines not shown]
[SelectionDAG] Scalarize <1 x T> vector types for atomic store (#197165)
`store atomic <1 x T>` is not valid. This change legalizes
vector types of atomic store via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.
This is the store-side counterpart to #148894. Stacked on top of
#197372; and below #197166.
[X86][AtomicExpand] Remove X86's shouldCastAtomicLoadInIR override (added in #148899)
So that atomic floating-point and FP-vector loads are no longer bitcast to an integer
at the IR level by AtomicExpand.
[clang][Sema] Diagnose nested local classes defined in a different block scope than their parent (#197863)
Fixes #193472.
[[class.local]/3](https://eel.is/c++draft/class.local#3) says:
> A class nested within a local class is a local class. A member of a
local class `X` shall be declared only in the definition of `X` or, __if
the member is a nested class, in the nearest enclosing block scope of
X__.
In other words:
```cpp
void f() {
struct X { struct S; };
struct X::S {}; // okay
struct X { struct S; };
[17 lines not shown]
[VPlan] Add branch-on-cond false original unconditional latch (NFC). (#198539)
For loops where the latch does not exit, addInitialSkeleton adds the
middle block as additional successor, as early canonicalization.
But then we end up with a block without terminator and multiple
successors. Fix this by adding a branch-on-cond false as terminator.
This preserves the original behavior (backegdge always taken) and
resolves the verifier issue.
PR: https://github.com/llvm/llvm-project/pull/198539
[OpenMP][OMPIRBuilder] Fix non-determinism in removeUnusedBlocksFromParent
The openmp-cli-fuse02.mlir test fails non-deterministically (~20%) when
unrelated patches add or reorder code, causing the linker to place
objects at different offsets and the heap allocator to return different
addresses at runtime. SmallPtrSet iteration order depends on these
pointer addresses (also randomized across runs by ASLR), so fuseLoops
sometimes leaves dead blocks in the function.
The root cause is erasing blocks from the set while iterating it to
check for external uses—removing one block mid-pass changes the result
for blocks checked later. Both the remove_if form and the earlier
make_early_inc_range version (pre-b6a94b6bfb2c) have this defect.
Fix by collecting all blocks to keep before erasing, so every block is
evaluated against the same snapshot of the set.
[Offload] Fix build install directory and remove 'add_llvm_library' (#198622)
Summary:
The problem is that we do not correctly set the build directory output
for offload/. Normally, it's supposed to mirror the install pattern.
This is because we both have variants and so people can use the compiler
from the build directory.
Currently, if you build more than one variant of the offload/ library
they will clobber each-other in `<build>/lib/`, so no cross compiling
allowed. Additionally, these will not be usable in the build directory
because the compiler will think that they are in the triple directory
when they are not.
Relatively simple fix, just copy-paste the pattern every other runtime
uses and then remove the implicit handling we get from
`add_llvm_libraries`. The only this it did for us was automatically map
component names to the libraries, which is easy enough to do.
[VPlan] Sink VPRecipeValue dtors. (#198623)
Currently (after https://github.com/llvm/llvm-project/pull/195483) the
VPRecipeValue accesses the defining value and removes it. This can cause
uninitialized memory reads, because the Def pointer held by the
VPMultiDefValue is destroyed before the super class destructor runs.