[X86] Handle X86ISD::EXPAND/COMPRESS nodes as target shuffles (#171119)
Allows for shuffle simplification
Required a minor fix to the overly reduced compress-undef-float-passthrough.ll regression test
[VPlan] Use nuw when computing {VF,VScale}xUF (#170710)
These quantities should never unsigned-wrap. This matches the behavior
if only VFxUF is used (and not VF): when computing both VF and VFxUF,
nuw should hold for each step separately.
[ADT] Make use of subsetOf and anyCommon methods of BitVector (NFC)
Replace the code along these lines
BitVector Tmp = LHS;
Tmp &= RHS;
return Tmp.any();
and
BitVector Tmp = LHS;
Tmp.reset(RHS);
return Tmp.none();
with `LHS.anyCommon(RHS)` and `LHS.subsetOf(RHS)`, correspondingly, which
do not require creating temporary BitVector and can return early.
[BOLT][BTI] Add needed BTIs in LongJmp or refuse to optimize binary
This patch adds BTI landing pads to ShortJmp/LongJmp targets in the
LongJmp pass when optimizing BTI binaries.
BOLT does not have the ability to add BTI to all types of functions.
This patch aims to insert the landing pad where possible, and emit an
error where it currently is not.
BOLT cannot insert BTIs into several function "types", including:
- ignored functions,
- PLT functions,
- other functions without a CFG.
Additional context:
In #161206, BOLT gained the ability to decode the .note.gnu.property
section, and warn about lack of BTI support for BOLT. However, this
warning is misleading: the emitted binary may not need extra BTI landing
[3 lines not shown]
[ADT] Add `llvm::reverse_conditionally()` iterator (#171040)
This patch adds a simple iterator range that allows conditionally
iterating a collection in reverse. It works with any collection
supported by `llvm::reverse(Collection)`.
```
void foo(bool Reverse, std::vector<int>& C) {
for (int I : reverse_conditionally(C, Reverse)) {
// ...
}
}
```
[AMDGPU][NFC] Update a comment about FLAT v/s LDSDMA
The change in #170263 does not do justice to common knowledge in the backend.
Fix the comment to reflect the relation between FLAT encoding, flat pointer
access, and LDSDMA operations.
clang: Use generic builtins in cuda complex builtins header (#171106)
There's no reason to use the ocml or nv prefixed functions and
maintain this list of alias macros. I left these macros in for
NVPTX in the scalbn and logb case, since those have a special
case hack in the AMDGPU codegen and probably do not work on ptx.
[AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886)
Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code
generation when NPM is enabled by default.
Previously, DAG.getPass() returns nullptr when using NPM, causing the
argument usage info to be unavailable during ISel. This resulted in
fallback to FixedABIFunctionInfo which assumes all implicit arguments
are needed, generating unnecessary register setup code for entry
functions.
Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll
Changes:
- Split AMDGPUArgumentUsageInfo into a data class and NPM analysis
wrapper
- Update SIISelLowering to use DAG.getMFAM() for NPM path
- Add RequireAnalysisPass in addPreISel() to ensure analysis
availability
This follows the same pattern used for PhysicalRegisterUsageInfo.
[mlir][py] partially use mlir_type_subclass for IRTypes.cpp
Port the bindings for non-shaped builtin types in IRTypes.cpp to use the
`mlir_type_subclass` mechanism used by non-builtin types. This is part of a
longer-term cleanup to only support one subclassing mechanism. Eventually, the
`PyConcreteType` mechanism will be removed.
This required a surgery in the type casters and the `mlir_type_subclass` logic
to avoid circular imports of the `_mlir.ir` module that would otherwise when
using `mlir_type_subclass` to define classes in the `_mlir.ir` module.
Tests are updated to use the `.get_static_typeid()` function instead of the
`.static_typeid` property that was specific to builtin types due to the
`PyConcreteType` mechanism. The change should be NFC otherwise.
[mlir][py] avoid crashing on None contexts in custom `get`s
Following a series of refactorings, MLIR Python bindings would crash if a
dialect object requiring a context defined using mlir_attribute/type_subclass
was constructed outside of the `ir.Context` context manager. The type caster
for `MlirContext` would try using `ir.Context.current` when the default `None`
value was provided to the `get`, which would also just return `None`. The
caster would then attempt to obtain the MLIR capsule for that `None`, fail,
but access it anyway without checking, leading to a C++ assertion failure or
segfault.
Guard against this case in nanobind adaptors. Also emit a warning to the user
to clarify expectations, as the default message confusingly says that `None` is
accepted as context and then fails with a type error. Using Python C API is
currently recommended by nanobind in this case since the surrounding function
must be marked `noexcept`.
The corresponding test is in the PDL dialect since it is where I first observed
the behavior. Core types are not using the `mlir_type_subclass` mechanism and
are immune to the problem, so cannot be used for checking.
[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)
In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we
unprofitably loop vectorize on RISC-V.
The loop looks something like:
```c
for (int i = 0; i < n; i++) {
if (x0[i] == a)
if (x1[i] == b)
if (x2[i] == c)
// do stuff...
}
```
Because it's so deeply nested the actual inner level of the loop rarely
gets executed. However we still deem it profitable to vectorize, which
due to the if-conversion means we now always execute the body.
[19 lines not shown]
[OpenACC][CIR] Implement routine 'bind'-with-a-string lowering (#170916)
The 'bind' clause emits an attribute on the RoutineOp that states which
function it should call on the device side. When provided in
double-quotes, the function on the device side should be the exact name
given. This patch emits the IR to do that.
As a part of that, we add a helper function to the OpenACC dialect to do
so, as well as a version that adds the ID version (though we don't
exercise th at yet).
The 'bind' with an ID should do the MANGLED name, but it isn't quite
clear what that name SHOULD be yet. Since the signature of a function is
included in its mangling, and we're not providing said signature, we
have to come up with something. This is left as an exercise for a future
patch.