[PatternMatch][NFC] Add `m_IToFP` and `m_FPToI` (#188040)
Added two patterns for IR pattern matching, `m_IToFP` and `m_FPToI`
which are basically shortcuts of `m_CombinedOr(..., ...)`
> if there isn't already one, PatternMatch should have an m_ItoFP which
covers both
_Originally posted by @arsenm in
https://github.com/llvm/llvm-project/pull/185826#discussion_r2967473936_
/cc @arsenm
[NFC][analyzer] Clean up and document `ExplodedNodeSet` (#187742)
`ExplodedNodeSet` is a simple and useful utility type in the analyzer,
but its insertion methods were a bit confusing, so this commit clarifies
them (and adds doc-comments for this class).
Previously this class had `void Add(ExplodedNode*)` for inserting single
nodes and `void insert(const ExplodedNodeSet &)` for inserting all nodes
from another set; but `ExplodedNode*` is implicitly convertible to
`ExplodedNodeSet`, so it was also possible to insert single nodes with
`insert`. There was also a subtle difference between `Set.Add(Node)` and
`Set.insert(Node)`: `Add` accepted and ignored nullpointers and sink
nodes (which is often useful) while the constructor
`ExplodedNodeSet(ExplodedNode*)` enforced the same invariant in a less
helpful way, with an assertion.
This commit eliminates the name `Add` (because `insert` is more
customary for set types), but standardizes its "null or sink nodes are
silently ignored" behavior which is very useful in practice.
[10 lines not shown]
[AMDGPU][GlobalISel] Add RegBankLegalize rules for permlane16_swap/permlane32_swap (#187810)
Add register bank legalize rules for the amdgcn_permlane16_swap and
amdgcn_permlane32_swap intrinsics. Both results and both source register
operands map to VGPR since these are VALU lane swap operations.
Enable -new-reg-bank-select in the permlane16.swap and permlane32.swap
tests.
Alias table comment cleanup.
Comments are already cleaned up before being written to the table
file, but it's still best to ensure they're encoded before display.
[mlir][x86] Hardware extension namespaces (#184392)
Adds hardware extension C++ namespaces to X86 dialect op definitions to
match their IR mnemonic extensions.
All X86 dialect ops are updated to follow the scheme first introduced
with AMX ops i.e., 'x86::{ext}::{op_name}'.
Nested namespaces improve source code readability by explicitly
indicating which hardware extension each operation requires, and it
aligns naming scheme between code and IR.
[mlir][linalg] Fix crash in linalg.reduce verifier when inputs \!= inits count (#186278)
Add an early check in `ReduceOp::verify()` that compares the operand
count from the ODS accessor with `getNumDpsInputs()`. A mismatch means
the `SameVariadicOperandSize` invariant is violated and the verifier
emits a clear diagnostic instead of crashing.
Fixes #93973
Assisted-by: Claude Code
[PowerPC] Skip tryBitfieldInsert when an operand is a constant (#187663)
When either operand of an OR node is a constant, bail out of
`tryBitfieldInsert` and let `ORI`/`ORIS`/`ADDI`/`ADDIS` tablegen
patterns handle it.
These patterns produce a single instruction without the tied-register
constraint that RLWIMI requires, avoiding unnecessary LI + RLWIMI
sequences.
Prep work to help with regressions identified on
https://github.com/llvm/llvm-project/pull/186461
17939 ena driver does not attach on AWS Nitro v5/v6 instances
17968 ena missing DMA sync for Tx descriptor
Reviewed by: Robert Mustacchi <rm at fingolfin.org>
Reviewed by: Ryan Zezeski <ryan at zinascii.com>
Approved by: Patrick Mooney <pmooney at pfmooney.com>
[AsmPrinter] Fix some issues with instruction size verification
If the instruction is part of a bundle, then emitInstruction() will
emit the entire bundle. As such, we should be summing up the sizes
of all instructions in the bundle.
Additionally, do not run the verification if an error has already
occurred. In that case, there may be a size mismatch as a result
of the error.
These came up when trying to enable the verification on additional
targets.
[MLIR] Move SFINAE from return type to template argument in OpImplementation.h (NFC) (#188039)
Migrate all template methods that used `std::enable_if_t<cond,
ReturnType>` return-type SFINAE to the `typename =
std::enable_if_t<cond>` default template parameter style. This makes the
actual return type visible at a glance and the constraint readable as
part of the template signature.
Complementary overload pairs that require the `* = nullptr` non-type
parameter trick to remain structurally distinct
(parseCustomAttributeWithFallback, parseCustomTypeWithFallback) are left
unchanged.
Addressing post-merge comment on #186192
Assisted-by: Claude Code
[AArch64][llvm] Redefine some isns as an alias of `SYS`
Some instructions are not currently defined as an alias of `SYS`
when they should be, so they don't disassemble back into the
native instruction, but instead disassemble into `SYS`.
Fix these cases and add additional testcase.
Note that I've left `GCSPUSHM` due to a `mayStore`, `GCSSS1` and
`GCSSS2` as they're used in AArch64ISelDAGToDAG.cpp, and `GCSPOPM`
has an intrinsic pattern in AArch64InstrInfo.td. They will disassemble
correctly though, as they use `InstAlias`.
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
[LoongArch] Mark VREPLGR2VR/XVREPLGR2VR as rematerializable to reduce register pressure (#187431)
The VREPLGR2VR and XVREPLGR2VR instruction families replicate a scalar
general-purpose register value into all elements of a vector register.
These instructions are side-effect free and relatively cheap, with their
result depending only on the input register.
Mark them as isReMaterializable to allow the register allocator to
recompute the value when profitable instead of spilling and reloading it
from memory.
This can help reduce register pressure and avoid unnecessary memory
traffic in vectorized code.
[LoongArch] Mark VPICK_ZEXT_ELT as zero-extending in computeKnownBits (#187177)
Teach computeKnownBitsForTargetNode that VPICK_ZEXT_ELT produces a
zero-extended result.
VPICK_ZEXT_ELT extracts a narrower element (e.g. i16) and returns it in
a larger integer type (e.g. i64) with the upper bits guaranteed to be
zero. However, without KnownBits information, LLVM treats the upper bits
as unknown, which inhibits optimizations.
By marking all bits above the source element width as known zero, this
enables DAG combine and other optimizations to eliminate redundant
operations such as AND masks and SIGN_EXTEND_INREG.
For example, this allows patterns like:
(sign_extend_inreg (VPICK_ZEXT_ELT ...), i32)
to be simplified when the sign bit is known to be zero.
[PowerPC] Fix typo in getInstSizeVerifyMode() hook
The logic here was inverted from what it was intended to be (which
shows up when forcing tests to emit object files).