Reland of #108413: [Offload] Introduce offload-tblgen and initial new API implementation (#118503)
This is another attempt to reland the changes from #108413
The previous two attempts introduced regressions and were reverted. This
PR has been more thoroughly tested with various configurations so
shouldn't cause any problems this time. If anyone is aware of any likely
remaining problems then please let me know.
The changes are identical other than the fixes contained in the last 5
commits.
---
### New API
Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
[31 lines not shown]
[ASTMatcher] Fix redundant macro expansion checks in getExpansionLocOfMacro (#117143)
A performance issue was descibed in #114521
**Root Cause**: The function getExpansionLocOfMacro is responsible for
finding the expansion location of macros. When dealing with macro
parameters, it recursively calls itself to check the expansion of macro
arguments. This recursive logic redundantly checks previous macro
expansions, leading to significant performance degradation when macros
are heavily nested.
Solution
**Modification**: Track already processed macros during recursion.
Implementation Details:
Introduced a data structure to record processed macros.
Before each recursive call, check if the macro has already been
processed to avoid redundant calculations.
**Testing**:
1. refer to #114521
Finder->addMatcher(expr(isExpandedFromMacro("NULL")).bind("E"), this);
[4 lines not shown]
[SystemZ] Add realistic cost estimates for vector reduction intrinsics (#118319)
This PR adds more realistic cost estimates for these reduction
intrinsics
- `llvm.vector.reduce.umax`
- `llvm.vector.reduce.umin`
- `llvm.vector.reduce.smax`
- `llvm.vector.reduce.smin`
- `llvm.vector.reduce.fadd`
- `llvm.vector.reduce.fmul`
- `llvm.vector.reduce.fmax`
- `llvm.vector.reduce.fmin`
- `llvm.vector.reduce.fmaximum`
- `llvm.vector.reduce.fminimum`
- `llvm.vector.reduce.mul
`
The pre-existing cost estimates for `llvm.vector.reduce.add` are moved
to `getArithmeticReductionCosts` to reduce complexity in
[10 lines not shown]
[AMDGPU] Refine AMDGPUCodeGenPrepareImpl class. NFC. (#118461)
Use references instead of pointers for most state, initialize it all in
the constructor, and common up some of the initialization between the
legacy and new pass manager paths.
[flang] Update CommandTest for AIX (NFC) (#118403)
With the change in commit e335563, the behavior for `ECLGeneralErrorCommandErrorSync`
on AIX is the same as on Linux.
[SPIR-V] Fix emission of debug and annotation instructions and add SPV_EXT_optnone SPIR-V extension (#118402)
This PR fixes:
* emission of OpNames (added newly inserted internal intrinsics and
basic blocks)
* emission of function attributes (SRet is added)
* implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL
Function Control flag, and add implementation of the SPV_EXT_optnone
SPIR-V extension.
[mlir][llvm] Align linkage enum order with LLVM (NFC) (#118484)
This change doesn't introduce any functional differences but aligns the
implementation more closely with LLVM's representation. Previously, the
code generated a lookup table to map MLIR enums to LLVM enums due to the
lack of one-to-one correspondence. With this refactoring, the generated
code now casts directly from one enum to another.
[SPIR-V] Fix generation of invalid SPIR-V in cases of of bitcasts between pointers and multiple null pointers used in the input LLVM IR (#118298)
This PR resolved the following issues:
(1) There are rare but possible cases when there are bitcasts between
pointers intertwined in a sophisticated way with loads, stores, function
calls and other instructions that are part of type deduction. In this
case we must account for inserted bitcasts between pointers rather than
just ignore them.
(2) Null pointers have the same constant representation but different
types. Type info from Intrinsic::spv_track_constant() refers to the
opaque (untyped) pointer, so that each MF/v-reg pair would fall into the
same Const record in Duplicate Tracker and would be represented by a
single OpConstantNull instruction, unless we use precise pointee type
info. We must be able to distinguish one constant (null) pointer from
another to avoid generating invalid code with inconsistent types of
operands.
[ConstraintElim] Use nusw flag for GEP decomposition
Check for nusw instead of inbounds when decomposing GEPs.
In this particular case, we can also look through multiple nusw
flags, because we will ultimately be working in the unsigned
constraint system.
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.
Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.
PR: https://github.com/llvm/llvm-project/pull/114305
[lldb] Fix "exact match" debug_names type queries (#118465)
.. in the global namespace
The problem was the interaction of #116989 with an optimization in
GetTypesWithQuery. The optimization was only correct for non-exact
matches, but that didn't matter before this PR due to the "second layer
of defense". After that was removed, the query started returning more
types than it should.
Reapply "[clang] Fix name lookup for dependent bases" (#118003)
Unlike the previous version
(https://github.com/llvm/llvm-project/pull/114978), this patch also
removes an unnecessary assert that causes Clang to crash when compiling
such tests. (clang/lib/AST/DeclCXX.cpp)
https://lab.llvm.org/buildbot/#/builders/52/builds/4021
```c++
template <class T>
class X {
public:
X() = default;
virtual ~X() = default;
virtual int foo(int x, int y, T &entry) = 0;
void bar() {
[19 lines not shown]
IR: introduce struct with CmpInst::Predicate and samesign (#116867)
Introduce llvm::CmpPredicate, an abstraction over a floating-point
predicate, and a pack of an integer predicate with samesign information,
in order to ease extending large portions of the codebase that take a
CmpInst::Predicate to respect the samesign flag.
We have chosen to demonstrate the utility of this new abstraction by
migrating parts of ValueTracking, InstructionSimplify, and InstCombine
from CmpInst::Predicate to llvm::CmpPredicate. There should be no
functional changes, as we don't perform any extra optimizations with
samesign in this patch, or use CmpPredicate::getMatching.
The design approach taken by this patch allows for unaudited callers of
APIs that take a llvm::CmpPredicate to silently drop the samesign
information; it does not pose a correctness issue, and allows us to
migrate the codebase piece-wise.
[InstCombine] Support gep nuw in icmp folds (#118472)
Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned
icmp of gep nusw nuw folds to unsigned samesign icmp of offsets.
Proofs: https://alive2.llvm.org/ce/z/VEwQY8
[flang] Treat pre-processed input as fixed (#117563)
Fixes an issue introduced by
9fb2db1e1f42 [flang] Retain spaces when preprocessing fixed-form source
Where flang -fc1 fails to parse preprocessor output because it now
remains in fixed form.
[SPIR-V] Fixup storage class for global private (#118318)
Re-land of #116636
Adds a new address spaces: hlsl_private. Variables with such address
space will be emitted with a Private storage class.
This is useful for variables global to a SPIR-V module, since up to now,
they were still emitted with a Function storage class, which is wrong.
---------
Signed-off-by: Nathan Gauër <brioche at google.com>
[LoopVectorize] Add tests for dereferenceable loads in more loops (#118470)
* Adds tests for strided accesses.
* Adds tests for reverse loops.
As part of this I've moved one of the negative tests from
load-deref-pred-align.ll into a new file
(load-deref-pred-neg-off.ll) because the pointer type had a
size of 16 bits and I realised it's probably not sensible for
allocas that are >16 bits in size!