[mlir][spirv] Mark several SPIR-V TOSA Ext Inst ops as NoMemoryEffects (#191814)
Initially such ops were marked Pure wrongly since they could overflow or
underflow the accumulator and result in undefined behavior.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[LFI][AArch64] Add AArch64 LFI rewrites for system instructions (#186896)
This builds on the MCLFIRewriter infrastructure to add the
AArch64-specific LFI rewriter, which rewrites AArch64 instructions for
LFI sandboxing during the assembler step.
The initial rewriter handles system instructions: system calls, thread
pointer accesses, and also rejects modifications to reserved registers.
[LifetimeSafety] Track origins through std::function (#191123)
1. Recognizes `std::function` and `std::move_only_function` as types
that can carry origins from a wrapped lambda's captures, propagating
origins through both construction and assignment.
2. Adds a kill-only mechanism (i.e., a new `KillOriginFact`) to clear
old loans when the RHS has no origins.
Fixes #186009
[CoroEarly][IR] Clarify semantic of llvm.coro.end (#191752)
We introduced a workaround for the following pattern in #139243:
``` LLVM
define void @fn() presplitcoroutine {
%__promise = alloca ptr, align 8
...
coro.ret:
call void @llvm.coro.end(ptr null, i1 false, token none)
store ptr null, ptr %__promise, align 8
ret void
}
```
where DSE considers `__promise` dead after the return and eliminates the
store, leading to a miscompilation.
However, after #151067, the problematic pattern is gone. And it
currently looks like:
[17 lines not shown]
[NFC][SPIR-V] Fix cbuffer.ll test to pass spirv-val validation (#191940)
Mark `main()` function as a compute shader entry point with numthreads
attribute so the test produces valid SPIR-V
related to https://github.com/llvm/llvm-project/issues/190736
[AArch64] Add new dot insts. to cost model (#189642)
This patch builds on #184659 and #184649 and adds cost modelling for new
dot instructions variants, codegened in those patches.
[Driver][HIP] Do not default to `hidden` visibility for AMDGCNSPIRV (#191820)
SPIR-V cannot encode hidden for now, which leads to quirky errors. For
now we deal with this at run time, as part of JIT. Once SPIR-V learns
about `hidden` it'll be revisited.
[mlir][tosa] Optimize non-narrowing float casts (#191439)
Extend the existing NonNarrowingCastsOptimization to also cover casts
between floating point types f32, f16, bf16, f8E4M3FN and F8E5M2. Avoid
introducing direct casts between f8 types since those are not allowed in
TOSA.
Also expand the set of cases that are considering non-narrowing by only
checking if the cast we're trying to remove is non-narrowing. Example
i16 -> i32 -> i8 would have been rejected before, but it is now safely
converted to a single i16 -> i8 tosa.cast, since the behaviour should
identical for the entire input space.
Finally disallow the optimization in the case when the cast that we
would remove involves integer types of different signedness.
Signed-off-by: Ian Tayler Lessa <ian.taylerlessa at arm.com>
[AArch64] Hint regalloc to choose distinct predicate for MATCH/CMP (#190139)
For some cores it is preferable to choose a destination predicate
register that does not match the governing predicate.
The hint is conservative in that it tries not to pick a callee-save
register if it's not already used/allocated for other purposes, as that
would introduce new spills/fills. Note that this might be preferable if
the instruction is executed in a loop, but it might also be less
preferable for small functions that have an SVE interface (p4-p15 are
caller-preserved).
It is enabled for all cores by default, but it can be disabled by adding
the `disable-distinct-dst-reg-cmp-match` feature. This feature can also
be added to specific cores if this behaviour is undesirable.
[libc++] LWG4511: Inconsistency between the deduction guide of `std::mdspan` taking `(data_handle_type, mapping_type, accessor_type)` and the corresponding constructor (#191950)
No functional change; this only removes a redundant const qualifier.
Fixes: #189860
[X86] Fix VPMOVPattern folding for extended registers (#191760)
Fixes a problem that tryCompressVPMOVPattern incorrectly folds
instruction using extended registers into VEX. Introduced relevant tests
in MIR.
AI Statement: I used AI to write the tests.
Fixes #191304
[AVX-512] Fix for disjoint-or-fold (VGF2P8AFFINEQB) (#190896)
Fixes #190502
Added implementation of helper combineOrWithGF2P8AFFINEQB and wired the logic with combineOrXorWithSETCC:
Fold: (GF2P8AFFINEQB(X, Y, Imm) or_disjoint SplatVal) -> GF2P8AFFINEQB(X, Y, Imm ^ SplatVal)
When OR is disjoint (no common bits), the splat constant can be folded directly into the GF2P8AFFINEQB immediate via XOR.
[mlir] transform dialect; add pre/post-condition type
Add a transform dialect type denoting additional invariants on payload
IR usable for pre/post-conditions of a transformation. The invariants
are defined as a list of attributes in the type parameter, where the
attribute implements the interface for invariant-checking. This allows
clients to factor out, explicify and deduplicate precondition
verification logic.
This required adding support for Transform dialect extensions injecting
attributes into the dialects similarly to how they already do this for
operations and types.
Co-authored-by: Tim Gymnich <tim at gymni.ch>
Co-authored-by: Martin Lücke <martin.luecke at amd.com>
Assisted-by: Claude Opus 4.3 / Cursor
[DAG] Add funnel-shift matchers to SDPatternMatch (Fixes #185880) (#186593)
Add new SelectionDAG pattern matchers for funnel shifts:
- m_FShL and m_FShR as ternary wrappers for ISD::FSHL/ISD::FSHR
- m_FShLLike and m_FShRLike to match:
-- direct FSHL/FSHR nodes
-- ROTL/ROTR equivalents (binding both X and Y to the same rotate operand)
-- OR(SHL(X, C), SRL(Y, BW - C)) forms (including commuted OR)
Also add unit tests covering positive and negative cases for:
- direct funnel-shif matching
- rotate equivalence matching
- OR-based funnel-shift-like patterns
Fixes #185880
[clang][bytecode] Use `CopyArray` for primitive ArrayInitLoops (#191956)
This reduces the bytecode output for the copy constructor of a struct
such as:
```c++
struct Buffer {
struct {
char D[N];
} V;
Buffer() = default;
};
```
from
```
Buffer<5>::(unnamed struct)::(unnamed struct at array.cpp:873:3) 0x7d38d2de3f80
frame size: 104
arg size: 96
[62 lines not shown]
[LoongArch] Select V{ADD,SUB}I for operations with negative splat immediates
Currently, vector add/sub with a negative splat immediate is lowered as a
vector splat followed by a register-register add, e.g.:
```
vrepli.b $vr1, -1
vadd.b $vr0, $vr0, $vr1
```
This misses the opportunity to use the more efficient V{ADD,SUB}I instruction
with a positive immediate.
This patch introduces `selectVSplatImmNeg` to detect negative splat
immediates whose negated value fits in a 5-bit unsigned immediate. New
patterns `(Pat{Vr,Vr}Nimm5)` are added to match:
```
add v, splat(-imm) --> vsubi v, v, imm
[7 lines not shown]
[LifetimeSafety] Detect use-after-scope through fields in member calls (#191731)
Add `UseFact`s for field origins when calling instance methods.
Fixes #182945
---------
Co-authored-by: Utkarsh Saxena <usx at google.com>
[RISCV] Add Zvzip intrinsics (#186342)
In the RVV Clang builtins generator, a new prototype descriptor
`d` was added to represent vectors with `2 x LMUL`.
The `.ll` tests were generated by LLM and I have reviewed them.
And the .c tests were generated by
https://github.com/riscv-non-isa/riscv-rvv-intrinsic-doc/pull/431.
[mlirbc] Add AffineMap serialization support
Add binary bytecode encoding for AffineMapAttr, replacing the textual fallback.
AffineMap is encoded as numDims, numSymbols, numResults, followed by the result
expressions. Where each expression, AffineExpr, is encoded as a recursive tree
with a VarInt kind tag followed by kind-specific data.