[mlir][vector] Add missing tests (nfc) (#186990)
Currently, `ConvertVectorToLLVM` rejects strided memrefs when lowering
`vector.gather` and `vector.scatter`. This PR adds tests to document
that behavior.
Supporting strided memrefs in the lowering is left as future work.
However, it is still unclear whether gather/scatter on strided memrefs
should be supported at all (see the Discourse discussion [1]).
This PR also adds tests for `vector.load` and `vector.store` in
`invalid.mlir` to document that these ops do not support strided
memrefs.
[1] https://discourse.llvm.org/t/rfc-semantics-of-vector-gather-indices-with-strided-memrefs
[AArch64][llvm] Redefine some isns as an alias of `SYS`
Some instructions are not currently defined as an alias of `SYS`
when they should be, so they don't disassemble back into the
native instruction, but instead disassemble into `SYS`.
Fix these cases and add additional testcase.
Note that I've left `GCSPUSHM` due to a `mayStore`, `GCSSS1` and
`GCSSS2` as they're used in AArch64ISelDAGToDAG.cpp, and `GCSPOPM`
has an intrinsic pattern in AArch64InstrInfo.td. They will disassemble
correctly though, as they use `InstAlias`.
[mlir][bufferization] Fix crash with copy-before-write + bufferize-function-boundaries (#186446)
When `copy-before-write=1` is combined with
`bufferize-function-boundaries=1`, `bufferizeOp` creates a plain
`AnalysisState` (not `OneShotAnalysisState`) and passes it to
`insertTensorCopies`. Walking `CallOp`s during conflict resolution
called `getCalledFunction(callOp, state)`, which unconditionally cast
the `AnalysisState` to `OneShotAnalysisState` via `static_cast`, causing
UB and a stack overflow crash.
Fix by guarding the cast with `isa<OneShotAnalysisState>()` so that when
the state is a plain `AnalysisState`, the function falls through to
building a fresh `SymbolTableCollection` — the same safe fallback
already present.
Fixes https://github.com/llvm/llvm-project/issues/163052
Assisted-by: Claude Code
[NFC][NVPTX] Fix tcgen05.mma PTX instruction encoding (#186602)
.ashift should be before .collector::a::* according to PTX ISA.
ptxas accepts both orderings, but the spec-correct order is used now.
[mlir] Fix crash in diagnostic verifier for unmatched @unknown expectations (#186148)
When an expected-* directive uses the @unknown location specifier, the
associated ExpectedDiag record has an invalid (null) SMLoc as its
fileLoc. If the expected diagnostic is never produced, emitError() is
called to report the unmatched expectation, but it unconditionally
constructs an SMRange from fileLoc, triggering a null-pointer
dereference (UBSan) and an assertion failure in SMRange's constructor
which requires both endpoints to have equal validity.
Fix by guarding the SMRange construction with a fileLoc.isValid() check.
When fileLoc is invalid, call PrintMessage without a source range.
Fixes #163343
Assisted-by: Claude Code
Merge branch 'arm-fp-flt' into arm-fp-faddsub
The merged changes on main include a fix for the previous denormal
handling bug in the old Thumb1 addsf3. So one of my reasons to replace
it completely is gone. Therefore I'm reinstating it, and putting the
new one alongside it as a different time/space tradeoff.
dnsmasq: remove a too-strict validation
When adding ranges from setaddr.php we do not use the model so
one can input ::1000 etc and then Dnsmasq migration will fail
due to it not setting a constructor. We still validate :: prefix
for constructor use but now take :: prefix verbatim which
doesn't interfere with Dnsmasq service start.
Discussed-with: @monviech
(cherry picked from commit a5773fe71f5703426761b9965bd49e468366de27)
[WebAssembly] combine `bitmask` with `setcc <X>, 0, setlt` (#179065)
The rust `simd_bitmask` intrinsic is UB when the lanes of its input are
not either `0` or `!0`, presumably so that the implementation can be
more efficient because it could look at any bit. To get the "mask of
MSB" behavior of webassembly's `bitmask`, we would like to simply first
compare with a zero vector.
```llvm
define i32 @example(<2 x i64> noundef %v) {
entry:
%1 = icmp slt <16 x i8> %v, zeroinitializer
%2 = bitcast <16 x i1> %1 to i16
%3 = zext i16 %2 to i32
ret i32 %3
}
```
On x86_64, this additional comparison optimizes away, but for wasm it
[22 lines not shown]
Stop trying to crt_supersede one Arm .S file with another
Turns out that doesn't work: both versions of the assembly language
comparison were included in the output library, and the linker would
make an arbitrary choice of which to pull in to the link. Instead,
just put the old files on to the SOURCES list in an else clause.
[AArch64][GlobalISel] Fix uqadd/sub with scalar operands
Previously, neon uqadd/uqsub would not lower when given s32/s64 operands, as GlobalISel would wrongly try to put the operands on general-purpose register banks. Changing this in RegBankSelection allows the intrinsics to lower just like their signed versions.
[mlir][gpu] Add SymbolUserOpInterface to launch_func op (#173277)
The gpu.launch_func is an operation that performs symbol references.
Currently, its symbol validation logic is implemented within
GPUDialect::verifyOperationAttribute. To improve the clarity and
structure of the validation logic, this PR makes LaunchFuncOp implement
the SymbolUserOpInterface. In addition, implementing this interface
allows the operation to benefit from various symbol-usage analysis
passes.