[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[X86] Remove extra MOV after widening atomic store
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.
[AMDGPU] Add dot product patterns with saturating add (clamp) (#187945)
Add pattern matching for dot product operations combined with saturating
add intrinsics (llvm.uadd.sat / llvm.sadd.sat). This enables the
compiler to generate dot instructions with the clamp modifier instead of
separate dot + saturating add instructions.
Fixes #182095
## Changes
- Added UDot2SatPat and SDot2SatPat TableGen pattern classes that match
uaddsat/saddsat with dot2 computations
- Added performSatAddCombine DAG combiner function to handle
ISD::UADDSAT and ISD::SADDSAT nodes
- Added test file idot2-sat.ll
## Example
[16 lines not shown]
Revert "[clang][NFC] Mark CWG717 as implemented and add a test (#197732)" (#198074)
As reported in #197930, these new tests fail on the
`arm64-apple-darwin-unknown` target. There's not a consensus yet on how
to fix the breakage, so revert it until we can decide.
[mlir][spirv] Remove ConstantLike trait from spirv.ARM.GraphConstant (#198054)
Operations with the `ConstantLike` trait can always be folded into a
concrete attribute value. However, the `spirv.ARM.GraphConstant` op
cannot be folded, because its GraphConstantID is merely a unique
identifier used to map to the actual constants defined in the SPIR-V
module. Therefore, the `ConstantLike` trait should be removed from
`pirv.ARM.GraphConstant`. Fixes #197970.
[llvm-ir2vec] Breaking up llvm-ir2vec lib implementation to clean up MIR deps from ir2vec python bindings (#194414)
The Python bindings only expose IR2Vec functionality. MIR2Vec has no
Python API. However, the single `LLVMEmbUtils` library bundled both
IR2VecTool and MIR2VecTool, causing CodeGen and Target components to be
linked into the nanobind module unnecessarily.
This patch splits the library along that boundary. LLVMIREmbUtils covers
IR2Vec and is linked by both the CLI tool and the Python bindings.
LLVMMIREmbUtils covers MIR2Vec and is linked only by the CLI tool.
Result: Python wheel size reduces from ~14 MB to ~4 MB.
[llvm-ir2vec] Setting up ir2vec python bindings testing for ml-opt bots (#194593)
- ~We are enabling IR2Vec Python binding tests in the LLVM monolithic
Linux CI by adding -D LLVM_IR2VEC_ENABLE_PYTHON_BINDINGS=ON to
monolithic-linux.sh.~
- We're adding testing for ir2vec python bindings with the ml-opt
buildbots. To that end, we need to add pip install requirements, and
other relevant flags to make way for a seamless warning-free llvm build.
The following changes are being done here
- Adding a requirements.txt file, putting out an explicit nanobind
requirement.
- Adding the option for downstream users to test bindings as part of the
`check llvm` umbrella, by passing the appropriate bindings flag
- Suppressing warnings from the nanobind headers, in order to ensure a
seamless llvm cI build
[clang-tidy] Fix false positives about reinitialization detection in `bugprone-use-after-move` (#197438)
When calling base class's `operator=` through derived object, a implicit
cast with `UncheckedDerivedToBase` will be generated:
```
void foo() {
Base b;
Derived d;
std::move(d);
d = b;
}
```
AST for `d = b`'s `d`:
```
|-ImplicitCastExpr <col:3> 'GH62206::Base' lvalue <UncheckedDerivedToBase (Base)>
| `-DeclRefExpr <col:3> 'Derived' lvalue Var 0x1d11a400 'd' 'Derived'
```
This patch considers possible `implicitCastExpr` in the reinit matcher,
[8 lines not shown]
[PHIElimination] Clear stale LiveVariables AliveBlocks for undef PHI sources (#197764)
When PHI Elimination lowers a PHI with an undef source (e.g. from an
`IMPLICIT_DEF),` it skips the LiveVariables kill/AliveBlocks update
because the value is undefined. However, the source register's
AliveBlocks may still mark intermediate blocks as live-through from its
definition to the (now eliminated) PHI use. This causes MachineVerifier
failures in EXPENSIVE_CHECKS builds.
Fix by calling `recomputeForSingleDefVirtReg` on undef source registers
when their last PHI use on a CFG edge is eliminated, which correctly
clears the stale AliveBlocks entries.
Fixes the EXPENSIVE_CHECKS failure introduced by #196895.
[OpenACC] Fix invalid using inside of an openacc directive (#198058)
Bug report #197858 comes up with a reproducer where an invalid `using`
declaration checks the Scope it is in, and asserts if it isn't in a
DeclScope. Since all of the important directives that create scopes end
up causing a new scope anyway, this patch adds 'DeclScope' to the parse
scope for an OpenACC directive. This follows the guidance of the OpenMP
directives.
Fixes: #197858
[clang][bytecode] Fix wrong 'never produces a constant expression' diagnostic with static data members (#197881)
They can be initialized later, similar to extern variables.
[clang-doc][nfc] Silence tidy warning about anonymous namespace
clang-tidy complains that we should prefer static over the anonymous
namespace, despite the API being static in addition to being in the
anonymous namespace. We can silence the diagnostic by simply removing
the namespace declaration.
[clang-doc] Use explicit for single param constructors
This trips up some clang-tidy checks, so add the explicit keyword as
needed to satisfy the lints.
[clang-doc] Clean up inconsistent namespace usage in BitcodeWriter
Typically we forgo prefixing things with clang::doc or llvm:: unless
they overlap with something in std::, like `to_underlying()`. We also
group things to avoid non-internal symbols by placing types in the
anonymous namespace, and more logically grouping things that don't need
to be in the clang::doc namespace.
[clang-doc] Use const and constexpr arrays in BitcodeWriter
We have three static data structures in the BitcodeWriter implementation
that all use std::vector. Instead, we can make them constant arrays.
These data structures and their types are also not in the anonymous
namespace, so just move these helpers out of the clang::doc namespace
and improve the hygiene since we're changing the code anyway.
[libc] Make cpp::byte alias-safe (#194171)
Change LIBC_NAMESPACE::cpp::byte from an enum-backed type to unsigned
char so libc’s raw-memory utilities and sorting code can legally access
object representations without violating C++ strict-aliasing rules.
[MemoryBuiltins] Capture more information for alloc/free from attributes
We now read the `alloc_align` attribute to provide better alignment
information to users. `alloc-family` should be used as well, as
described in the LangRef. Two new helpers provide argument numbers,
rather than values.
[flang] Recognize effects on non-addressable resources in opt-bufferization.
opt-bufferization has been only handling `fir::DebuggingResource`
explicitly. This patch adds support for other non-addressable
resources, such as `fir::VolatileMemoryResource`. This allows
merging elemental/assign for the `volatile_src_nonvolatile_dst`
example in the updated LIT test.
[flang] Pass-through fir.volatile_cast in FIR AliasAnalysis.
It should be safe to pass-through `fir.volatile_cast` for the purpose
of alias analysis. The missing pass-through prevented optimization
of the `nonvolatile_src_volatile_dst` test (see updated LIT test).