[DAG] Fold (X +/- Y) & Y --> ~X & Y when Y is a power of 2 (or zero). (#181677)
Same as InstCombinerImpl::visitAnd
To prevent RISCV falling back to a mul call in known-never-zero.ll I've
had to tweak the (sub X, (vscale * C)) to (add X, (vscale * -C)) fold to
not occur if C is power-of-2 and the target has poor mul support.
Alive2: https://alive2.llvm.org/ce/z/Khvs5H
AMDGPU/GlobalISel: Regbanklegalize rules for INTRIN_IMAGE
Regbanklegalize rules for INTRIN_IMAGE loads and stores.
Because of very large number of different type signatures, rule specifies
only function for lowering (waterfall lowering of RsrcIdx operand if needed)
and this function also applies register banks.
[llvm-reduce] Add a pass to replace unconditional branches with returns (#180993)
Unconditional branches could end up in infinite loops in the reduced
code, while the code could have been reduce further.
This patch implements a simple pass that replaces unconditional branches
with returns.
[AArch64] Improve post-inc stores of SIMD/FP values
Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32).
This avoids transferring the value through a GPR when storing.
Also remove the pre-legalization early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.
[Tablegen] Patch RegUnitIntervals Initialization (#181173)
There were a few places it was missing some code-generation to properly
initialize it if enabled, and also it was missing the sentinel value.
[LoopInterchange] Update UTC version (NFC) (#181988)
This is a follow-up PR to #181804. While working on the stacked PRs, I
encountered some noisy diffs in the CHECK lines that don't change the
meaning of the tests. To avoid such changes and make the review easier,
this patch updates the UTC version. It also renames some BBs to suppress
warnings emitted by UTC.
[AMDGPU] Fix opcode comparison logic for G_INTRINSIC (#156008)
The check `(Opc < TargetOpcode::GENERIC_OP_END)` incorrectly
includes `G_INTRINSIC` (129), which is less than
`GENERIC_OP_END` (313), leading to logically dead code.
This patch reorders the conditionals to first check for `G_INTRINSIC`,
ensuring
correct handling of the `amdgcn_fdot2` intrinsic.
[OpenMP] Remove standalone build mode (#149878)
Remove all the CMake code for openmp standalone builds. Standalone
builds have been superseded by the runtimes default build (also
sometimes called the standalone runtimes build). The runtimes default
build can be thought of a standalone build with the standalone
boilerplate contained in <llvm-project>/runtimes/CMakeLists.txt. There
is no need for each runtime to contain the same boilerplate code again.
Builds still using the standalone build via
```sh
cmake -S <llvm-project>/openmp ...
```
can switch over to the runtimes default build using
```sh
cmake -S <llvm-project>/runtimes -DLLVM_ENABLE_RUNTIMES=openmp ...
```
Options that were valid for the standalone build are also valid for
default runtimes build, unless handled only in
[8 lines not shown]
[Clang][AArch64] set default mtune for macOS (#179136)
This patch sets a default tune-cpu on macOS targets to `apple-m5`.
The implementation adds a helper in
`clang/lib/Driver/ToolChains/Arch/AArch64.h` called by
`clang/lib/Driver/ToolChains/Clang.cpp`. It doesnt follow a "check then
get" flow because its very concise, and returns an optional instead. It
adds a missing test file for mtune on Apple macOS targets, including the
new logic.
clang: Add builtin header for amdhsa abi
This is place to put definitions for various ABI structs.
Currently device libs is just hardcoding magic numbers and casting
and it's incomprehensible.
[AArch64] Fold MIN/MAX(Vec[0], Vec[1]) to VECREDUCE_MIN/MAX(Vec)
If we have a lowering for `VECREDUCE_MIN/MAX` this is generally more
efficient than the scalar expansion.
[AArch64] Prefer SVE2 for fixed-length i64 [S|U][MIN|MAX] reductions (#181161)
With SVE2/SME we can lower the v2i64 min/max reductions to an SVE2
pairwise instruction. The throughput is about the same, but the SVE code
is smaller than the NEON expansion.
REAPPLY [clang-repl] Ensure clang-repl accepts all C keywords supported in all language models (#181335)
https://github.com/llvm/llvm-project/pull/142749 was reverted because
`_Float16` is only supported on the following targets
(https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point)
& the previous PR wasn't guarding it to expect a failure on some
targets.
Hence the CI failed with errors like
```
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/bin/clang -cc1 -internal-isystem /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib/clang/21/include -nostdsysteminc -fsyntax-only -verify -fincremental-extensions -std=c++20 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang/test/Interpreter/disambiguate-decl-stmt.cpp # RUN: at line 1
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/bin/clang -cc1 -internal-isystem /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib/clang/21/include -nostdsysteminc -fsyntax-only -verify -fincremental-extensions -std=c++20 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang/test/Interpreter/disambiguate-decl-stmt.cpp
error: 'expected-error' diagnostics seen but not expected:
File /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang/test/Interpreter/disambiguate-decl-stmt.cpp Line 113: _Float16 is not supported on this target
1 error generated.
```
This should now be fixed as we are expecting an error (or no error)
based on the target through the `expected-error 0-1` framework