[X86] combineSETCC - merge inner isScalarInteger() condition. NFC. (#182004)
All folds in the outer if() require this - inc combineVectorSizedSetCCEquality
[flang][OpenMP] Include check for fully unrolled loops into nest check, NFC (#181729)
It's naturally a part of the verification of constructs nested in loop
constructs, so perform that check there instead of having it in a
separate function.
[DAG] Fold (X +/- Y) & Y --> ~X & Y when Y is a power of 2 (or zero). (#181677)
Same as InstCombinerImpl::visitAnd
To prevent RISCV falling back to a mul call in known-never-zero.ll I've
had to tweak the (sub X, (vscale * C)) to (add X, (vscale * -C)) fold to
not occur if C is power-of-2 and the target has poor mul support.
Alive2: https://alive2.llvm.org/ce/z/Khvs5H
AMDGPU/GlobalISel: Regbanklegalize rules for INTRIN_IMAGE
Regbanklegalize rules for INTRIN_IMAGE loads and stores.
Because of very large number of different type signatures, rule specifies
only function for lowering (waterfall lowering of RsrcIdx operand if needed)
and this function also applies register banks.
[llvm-reduce] Add a pass to replace unconditional branches with returns (#180993)
Unconditional branches could end up in infinite loops in the reduced
code, while the code could have been reduce further.
This patch implements a simple pass that replaces unconditional branches
with returns.
[AArch64] Improve post-inc stores of SIMD/FP values
Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32).
This avoids transferring the value through a GPR when storing.
Also remove the pre-legalization early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.
[Tablegen] Patch RegUnitIntervals Initialization (#181173)
There were a few places it was missing some code-generation to properly
initialize it if enabled, and also it was missing the sentinel value.
[LoopInterchange] Update UTC version (NFC) (#181988)
This is a follow-up PR to #181804. While working on the stacked PRs, I
encountered some noisy diffs in the CHECK lines that don't change the
meaning of the tests. To avoid such changes and make the review easier,
this patch updates the UTC version. It also renames some BBs to suppress
warnings emitted by UTC.
[AMDGPU] Fix opcode comparison logic for G_INTRINSIC (#156008)
The check `(Opc < TargetOpcode::GENERIC_OP_END)` incorrectly
includes `G_INTRINSIC` (129), which is less than
`GENERIC_OP_END` (313), leading to logically dead code.
This patch reorders the conditionals to first check for `G_INTRINSIC`,
ensuring
correct handling of the `amdgcn_fdot2` intrinsic.
[OpenMP] Remove standalone build mode (#149878)
Remove all the CMake code for openmp standalone builds. Standalone
builds have been superseded by the runtimes default build (also
sometimes called the standalone runtimes build). The runtimes default
build can be thought of a standalone build with the standalone
boilerplate contained in <llvm-project>/runtimes/CMakeLists.txt. There
is no need for each runtime to contain the same boilerplate code again.
Builds still using the standalone build via
```sh
cmake -S <llvm-project>/openmp ...
```
can switch over to the runtimes default build using
```sh
cmake -S <llvm-project>/runtimes -DLLVM_ENABLE_RUNTIMES=openmp ...
```
Options that were valid for the standalone build are also valid for
default runtimes build, unless handled only in
[8 lines not shown]
[Clang][AArch64] set default mtune for macOS (#179136)
This patch sets a default tune-cpu on macOS targets to `apple-m5`.
The implementation adds a helper in
`clang/lib/Driver/ToolChains/Arch/AArch64.h` called by
`clang/lib/Driver/ToolChains/Clang.cpp`. It doesnt follow a "check then
get" flow because its very concise, and returns an optional instead. It
adds a missing test file for mtune on Apple macOS targets, including the
new logic.
clang: Add builtin header for amdhsa abi
This is place to put definitions for various ABI structs.
Currently device libs is just hardcoding magic numbers and casting
and it's incomprehensible.