[llvm][UnifyLoopExits] Avoid optimization if no exit block is found (#165343)
If there is not an exit block, we should not try unify the loops.
Instead we should just return.
Fixes #165252
[OpenCL] Add clang internal extension __cl_clang_function_scope_local_variables (#176726)
OpenCL spec restricts that variable in local address space can only be
declared at kernel function scope.
Add a Clang internal extension __cl_clang_function_scope_local_variables
to lift the restriction.
To expose static local allocations at kernel scope, targets can either
force-inline non-kernel functions that declare local memory or pass a
kernel-allocated local buffer to those functions via an implicit argument.
Motivation: support local memory allocation in libclc's implementation
of work-group collective built-ins, see example at:
https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives_helpers.llhttps://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives.cl#L182
Right now this is a Clang-only OpenCL extension intended for compiling
OpenCL libraries with Clang. It could be proposed as a standard OpenCL
extension in the future.
[libclc] replace float remquo with amd ocml implementation (#177131)
Current implementation has two issues:
* unconditionally soft flushes denormal.
* can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.
This PR upstreams remquo implementation from
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src/remainderF_base.h
It supports denormal and can pass OpenCL CTS test. Number of LLVM IR
instructions of function _Z6remquoffPU3AS5i increased from 96 to 680.
---------
Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
TargetLowering: Allow FMINNUM/FMAXNUM to lower to FMINIMUM/FMAXIMUM even without `nsz` (#177828)
This restriction was originally added in
https://reviews.llvm.org/D143256, with the given justification:
> Currently, in TargetLowering, if the target does not support fminnum,
we lower to fminimum if neither operand could be a NaN. But this isn't
quite correct because fminnum and fminimum treat +/-0 differently; so,
we need to prove that one of the operands isn't a zero.
As far as I can tell, this was never correct. Before
https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum`
were nondeterministic with regards to signed zero, so it's always been
perfectly legal to lower them to operations that order signed zeroes.
[LV] Add additional tests for early-exit loops loads not known deref.
Add additional test coverage for loops with loads that are not known to
be dereferenceable.
[InstCombine] Don't convert a compare+select into a minnum/maxnum intrinsic that can't be lowered back to a compare+select (#177821)
This is a step on the yak-shaving expedition to properly implement the
new `minnum`/`maxnum` signed-zero semantics.
`InstCombineSelect` will convert a `fcmp`+`select` sequence to a
`minnum`/`maxnum` intrinsic. It doesn't require the `fcmp` to have any
particular fast-math flags, just that the `select` has `nnan` and `nsz`
(or is being used in a context where the result doesn't care about
signed zero).
It's not correct to propagate the `nnan` flag from the `fcmp`
instruction for poison-propagation reasons. Patches like
https://github.com/llvm/llvm-project/pull/117977 and
https://github.com/llvm/llvm-project/pull/141010 have *generously* made
it so that if `fcmp` doesn't have fast-math flags, we can still perform
the transformation by simply dropping the flags on the generated
intrinsic.
[25 lines not shown]
[Polly] Reject scalable vector types (#177871)
Polly currently does not consider types without fixed length, which can
be encountered if an input source uses e.g. ARM SVE builtins. Such
programs have already been optimized manually. Non-fixed type lengths
also add to the difficulty of dependency analysis. Skip such types
entirely for now.
Fixes: #177859
[mlir][DialectUtils] Fix 0 step handling in `constantTripCount` (#177329)
A step size of "zero" does not indicate "zero iterations". It may
indicate an infinite number of iterations.
This commit makes some transformations more conservative. We used to
fold away some loops with step size 0 and that's now no longer the case.
Relation discussion:
https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
[SLP]Support for tree throttling in SLP graphs with gathered loads
Gathered loads forming DAG instead of trees in SLP vectorizer. When
doing the throttling analysis for such graphs, need to consider partially
matched gathered loads DAG nodes and consider extract and/or gather
operations and their costs.
The patch adds this analysis and allows cutting off the expensive
sub-graphs with gathered loads.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/177855
[clang] Don't assert on perfect overload match with _Atomic (#176619)
An assertion incorrectly treated difference in _Atomic qualification as
different types for the purpose of verifying a perfect match in overload
resolution in C++.
Fixes #170433
[VectorCombine] Fold vector.reduce.OP(F(X)) == 0 -> OP(X) == 0 (#173069)
This commit introduces a pattern to do the following fold:
vector.reduce.OP f(X_i) == 0 -> vector.reduce.OP X_i == 0
In order to decide on this fold, we use the following properties:
1. OP X_i == 0 <=> \forall i \in [1, N] X_i == 0 1'. OP X_i == 0 <=>
\exists j \in [1, N] X_j == 0
2. f(x) == 0 <=> x == 0
From 1 and 2 (or 1' and 2), we can infer that
OP f(X_i) == 0 <=> OP X_i == 0.
For some of the OP's and f's, we need to have domain constraints on X to
ensure properties 1 (or 1') and 2.
[52 lines not shown]
[clang][test] Fix builtin-rotate.c test __int128 test failure on ARM32 (#177732)
- Run the INT128 prefix checks on 64-bit targets since __int128 is not
supported on ARM32
Fixes https://lab.llvm.org/buildbot/#/builders/154/builds/26813
InstCombine: Improve single-use fneg(fabs(x)) SimplifyDemandedFPClass handling
Match the multi-use case's logic for understanding no-nan/no-inf context.
Also only apply the nsz handling in the single use case. alive2 seems to treat
nsz as nondeterministic for each use.
[MLIR] Fix GCC's `-Wreturn-type` warnings (#177654)
This patch fixes `-Wreturn-type` warnings which happens if MLIR is built
with GCC compiler (11.5 is used for detecting)
Founded errors
```
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp: In function ‘MlirGreedyRewriteStrictness mlirGreedyRewriteDriverConfigGetStrictness(MlirGreedyRewriteDriverConfig)’:
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp:399:1: warning: control reaches end of non-void function [-Wreturn-type]
399 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp: In function ‘MlirGreedySimplifyRegionLevel mlirGreedyRewriteDriverConfigGetRegionSimplificationLevel(MlirGreedyRewriteDriverConfig)’:
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp:414:1: warning: control reaches end of non-void function [-Wreturn-type]
414 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp: In member function ‘mlir::Speculation::Speculatability mlir::gpu::SubgroupBroadcastOp::getSpeculatability()’:
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:2522:1: warning: control reaches end of non-void function [-Wreturn-type]
2522 | }
[20 lines not shown]
[MoveAutoInit] Fix for miscompilation for #150120 (#173961)
Fixes the miscompilation discussed for the PR #164882 as part of
generalizing the optimization for the issue #150120.
Without this commit, MoveAutoInit moves the store instruction to a
different branch which does not dominate the user dominator node. This
results in UB at runtime. The example in the test case is specifically
for an irreducible loop, in which all the predecessor may not dominate
user dominator head.
To fix this problem, we've introduced a new check to verify if the
predecessor of the user dominator node does in fact dominate user
dominator node before deciding that it is the node where the instruction
will be moved to.
[MLIR][Python] Add a DSL for defining dialects in Python bindings (#169045)
Python bindings for the IRDL dialect were introduced in #158488. They
are currently usable—for constructing IR and dynamically loading modules
that contain `irdl.dialect` into MLIR. However, there are still several
pain points when working with them:
* The IRDL IR-building interface is not very intuitive and tends to be
quite verbose.
* We do not yet have the corresponding `OpView` classes for IRDL-defined
operations.
To address these issues, I propose creating a wrapper (effectively a
small “DSL”) on top of the existing IRDL Python bindings. This wrapper
aims to simplify IR construction and automatically generate the
corresponding `OpView` types. A simple example is shown below.
Currently, using the IRDL bindings looks like this:
[72 lines not shown]