[AMDGPU] Add dot product patterns with saturating add (clamp) (#187945)
Add pattern matching for dot product operations combined with saturating
add intrinsics (llvm.uadd.sat / llvm.sadd.sat). This enables the
compiler to generate dot instructions with the clamp modifier instead of
separate dot + saturating add instructions.
Fixes #182095
## Changes
- Added UDot2SatPat and SDot2SatPat TableGen pattern classes that match
uaddsat/saddsat with dot2 computations
- Added performSatAddCombine DAG combiner function to handle
ISD::UADDSAT and ISD::SADDSAT nodes
- Added test file idot2-sat.ll
## Example
[16 lines not shown]
Revert "[clang][NFC] Mark CWG717 as implemented and add a test (#197732)" (#198074)
As reported in #197930, these new tests fail on the
`arm64-apple-darwin-unknown` target. There's not a consensus yet on how
to fix the breakage, so revert it until we can decide.
[mlir][spirv] Remove ConstantLike trait from spirv.ARM.GraphConstant (#198054)
Operations with the `ConstantLike` trait can always be folded into a
concrete attribute value. However, the `spirv.ARM.GraphConstant` op
cannot be folded, because its GraphConstantID is merely a unique
identifier used to map to the actual constants defined in the SPIR-V
module. Therefore, the `ConstantLike` trait should be removed from
`pirv.ARM.GraphConstant`. Fixes #197970.
[llvm-ir2vec] Breaking up llvm-ir2vec lib implementation to clean up MIR deps from ir2vec python bindings (#194414)
The Python bindings only expose IR2Vec functionality. MIR2Vec has no
Python API. However, the single `LLVMEmbUtils` library bundled both
IR2VecTool and MIR2VecTool, causing CodeGen and Target components to be
linked into the nanobind module unnecessarily.
This patch splits the library along that boundary. LLVMIREmbUtils covers
IR2Vec and is linked by both the CLI tool and the Python bindings.
LLVMMIREmbUtils covers MIR2Vec and is linked only by the CLI tool.
Result: Python wheel size reduces from ~14 MB to ~4 MB.
[llvm-ir2vec] Setting up ir2vec python bindings testing for ml-opt bots (#194593)
- ~We are enabling IR2Vec Python binding tests in the LLVM monolithic
Linux CI by adding -D LLVM_IR2VEC_ENABLE_PYTHON_BINDINGS=ON to
monolithic-linux.sh.~
- We're adding testing for ir2vec python bindings with the ml-opt
buildbots. To that end, we need to add pip install requirements, and
other relevant flags to make way for a seamless warning-free llvm build.
The following changes are being done here
- Adding a requirements.txt file, putting out an explicit nanobind
requirement.
- Adding the option for downstream users to test bindings as part of the
`check llvm` umbrella, by passing the appropriate bindings flag
- Suppressing warnings from the nanobind headers, in order to ensure a
seamless llvm cI build
[clang-tidy] Fix false positives about reinitialization detection in `bugprone-use-after-move` (#197438)
When calling base class's `operator=` through derived object, a implicit
cast with `UncheckedDerivedToBase` will be generated:
```
void foo() {
Base b;
Derived d;
std::move(d);
d = b;
}
```
AST for `d = b`'s `d`:
```
|-ImplicitCastExpr <col:3> 'GH62206::Base' lvalue <UncheckedDerivedToBase (Base)>
| `-DeclRefExpr <col:3> 'Derived' lvalue Var 0x1d11a400 'd' 'Derived'
```
This patch considers possible `implicitCastExpr` in the reinit matcher,
[8 lines not shown]
[PHIElimination] Clear stale LiveVariables AliveBlocks for undef PHI sources (#197764)
When PHI Elimination lowers a PHI with an undef source (e.g. from an
`IMPLICIT_DEF),` it skips the LiveVariables kill/AliveBlocks update
because the value is undefined. However, the source register's
AliveBlocks may still mark intermediate blocks as live-through from its
definition to the (now eliminated) PHI use. This causes MachineVerifier
failures in EXPENSIVE_CHECKS builds.
Fix by calling `recomputeForSingleDefVirtReg` on undef source registers
when their last PHI use on a CFG edge is eliminated, which correctly
clears the stale AliveBlocks entries.
Fixes the EXPENSIVE_CHECKS failure introduced by #196895.
[OpenACC] Fix invalid using inside of an openacc directive (#198058)
Bug report #197858 comes up with a reproducer where an invalid `using`
declaration checks the Scope it is in, and asserts if it isn't in a
DeclScope. Since all of the important directives that create scopes end
up causing a new scope anyway, this patch adds 'DeclScope' to the parse
scope for an OpenACC directive. This follows the guidance of the OpenMP
directives.
Fixes: #197858
[clang][bytecode] Fix wrong 'never produces a constant expression' diagnostic with static data members (#197881)
They can be initialized later, similar to extern variables.
[clang-doc][nfc] Silence tidy warning about anonymous namespace
clang-tidy complains that we should prefer static over the anonymous
namespace, despite the API being static in addition to being in the
anonymous namespace. We can silence the diagnostic by simply removing
the namespace declaration.
[clang-doc] Use explicit for single param constructors
This trips up some clang-tidy checks, so add the explicit keyword as
needed to satisfy the lints.
[clang-doc] Clean up inconsistent namespace usage in BitcodeWriter
Typically we forgo prefixing things with clang::doc or llvm:: unless
they overlap with something in std::, like `to_underlying()`. We also
group things to avoid non-internal symbols by placing types in the
anonymous namespace, and more logically grouping things that don't need
to be in the clang::doc namespace.
[clang-doc] Use const and constexpr arrays in BitcodeWriter
We have three static data structures in the BitcodeWriter implementation
that all use std::vector. Instead, we can make them constant arrays.
These data structures and their types are also not in the anonymous
namespace, so just move these helpers out of the clang::doc namespace
and improve the hygiene since we're changing the code anyway.
[libc] Make cpp::byte alias-safe (#194171)
Change LIBC_NAMESPACE::cpp::byte from an enum-backed type to unsigned
char so libc’s raw-memory utilities and sorting code can legally access
object representations without violating C++ strict-aliasing rules.
[MemoryBuiltins] Capture more information for alloc/free from attributes
We now read the `alloc_align` attribute to provide better alignment
information to users. `alloc-family` should be used as well, as
described in the LangRef. Two new helpers provide argument numbers,
rather than values.
[flang] Recognize effects on non-addressable resources in opt-bufferization.
opt-bufferization has been only handling `fir::DebuggingResource`
explicitly. This patch adds support for other non-addressable
resources, such as `fir::VolatileMemoryResource`. This allows
merging elemental/assign for the `volatile_src_nonvolatile_dst`
example in the updated LIT test.
[flang] Pass-through fir.volatile_cast in FIR AliasAnalysis.
It should be safe to pass-through `fir.volatile_cast` for the purpose
of alias analysis. The missing pass-through prevented optimization
of the `nonvolatile_src_volatile_dst` test (see updated LIT test).
[libc] Fix install-libc to work with LLVM_LIBC_FULL_BUILD=OFF (#197366)
Initialize variables that are conditionally set to avoid undefined
references in install-libc and install-libc-stripped targets:
- Initialize added_bitcode_targets to empty string (may be undefined
when LIBC_TARGET_OS_IS_GPU=OFF)
- Initialize startup_target to empty string and only set to
"libc-startup" when both LLVM_LIBC_FULL_BUILD=ON and NOT baremetal
(startup directory is only included in full builds)
- Initialize header_install_target to empty string (may be undefined
when LLVM_LIBC_FULL_BUILD=OFF)
[DirectX] Do not emit !dbg on function definitions (#197449)
This was not done in LLVM 3.7. Instead, the !DISubprogram contains a
reference to the function (already emitted).
[libc] Add config option to use memory builtin functions. (#197977)
Add a new CMake and C++ definition configuration option
`LIBC_CONF_USE_MEM_BUILTINS` to allow users to use compiler builtins for
memory utility functions (memcpy, memset, memmove, memcmp, and bcmp)
instead of LLVM libc's internal implementations. Main use-cases are:
- when users want to bring their own memory functions implementations
that are highly optimized for their targets
- improve portability by providing a fallback for targets for which LLVM
libc does not have memory utility implementations yet
- to be used for libc/shared functions and their testings, as we expect
libc/shared functions to provide their own memory functions.
[lldb] Fix data race in ObjectFile::GetSectionList (#197812)
The early `m_sections_up == nullptr` check was performed outside the
module mutex, so two threads sharing the same Module could both enter
the branch and race on the write in CreateSections. Restructure so the
check and populate both happen under the module mutex; this is a
standard double-checked locking fix.
Found by ThreadSanitizer as part of #197792.