[mlir][GPU] Fix crash in WarpExecuteOnLane0Op::verify with wrong terminator (#183930)
WarpExecuteOnLane0Op::verify() called getTerminator() which performed an
unconditional cast<gpu::YieldOp> on the block's last operation. When the
op body was written with a different terminator (e.g. affine.yield), the
cast asserted immediately instead of emitting a verifier diagnostic.
Fix by using dyn_cast in verify() before calling getTerminator(), and
emitting a proper error message when the terminator is not gpu.yield.
Add a regression test to invalid.mlir.
Fixes #181450
[AMDGPU] Make uniform-work-group-size a valueless attribute
The "uniform-work-group-size" function attribute previously took a
string value of "true" or "false". Since presence alone can convey
the "true" semantics and absence can convey "false", the value is
unnecessary.
This patch converts it to a valueless string attribute: presence
indicates true, absence indicates false. For backward compatibility,
auto-upgrade logic is added in both UpgradeAttributes (bitcode) and
UpgradeFunctionAttributes: if the old value is "true", the attribute
is kept without a value; if "false", the attribute is removed.
All setters (Clang CodeGen, OMPIRBuilder, AMDGPUAttributor, ROCDL
translation) and readers (AMDGPUAttributor, AMDGPULowerKernelAttributes,
AMDGPUHSAMetadataStreamer) are updated accordingly. The attribute is
also documented in the AMDGPU LLVM IR Attributes table where it was
previously missing.
[lldb][Process/FreeBSDKernelCore] Add ppc64le support (#180669)
This is LLDB version of
https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/ppcfbsd-kern.c.
This enables selecting ppc64le and reading registers from PCB structure
on core dump and live kernel debugging. FPU registers aren't supported
yet due to pcb structure issue, but this change still achieves feature
parity with KGDB. Trapframe unwinding support will be implemented in
future. Test files using core dump from ppc64le will be implemented once
other kernel debugging improvements are done.
---------
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
[ARM] Lower strictfp vector fp16 rounding operations similar to default mode (#183700)
Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
[AMDGPU] Make uniform-work-group-size a valueless attribute
The "uniform-work-group-size" function attribute previously took a
string value of "true" or "false". Since presence alone can convey
the "true" semantics and absence can convey "false", the value is
unnecessary.
This patch converts it to a valueless string attribute: presence
indicates true, absence indicates false. For backward compatibility,
auto-upgrade logic is added in both UpgradeAttributes (bitcode) and
UpgradeFunctionAttributes: if the old value is "true", the attribute
is kept without a value; if "false", the attribute is removed.
All setters (Clang CodeGen, OMPIRBuilder, AMDGPUAttributor, ROCDL
translation) and readers (AMDGPUAttributor, AMDGPULowerKernelAttributes,
AMDGPUHSAMetadataStreamer) are updated accordingly. The attribute is
also documented in the AMDGPU LLVM IR Attributes table where it was
previously missing.
[AMDGPU] Enable shift64 hazard recognition for gfx9 (#183839)
Enable shift64 hazard recognition for gfx9 cores.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
Lower strictfp vector rounding operations similar to default mode
Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
[mlir][IR] Generalize `DenseElementsAttr` to custom element types (#179122)
`DenseElementsAttr` supports only a hard-coded list of element types:
`int`, `index`, `float`, `complex`. This commit generalizes the
`DenseElementsAttr` infrastructure: it now supports arbitrary element
types, as long as they implement the new `DenseElementTypeInterface`.
The `DenseElementTypeInterface` has the following helper functions:
- `getDenseElementBitSize`: Query the size of an element in bits. (When
storing an element in memory, each element is padded to a full byte.
This is an existing limitation of the `DenseElementsAttr`; with an
exception for `i1`.)
- `convertToAttribute`: Attribute factory / deserializer. Converts bytes
into an MLIR attribute. The attribute provides the assembly format /
printer for a single element.
- `convertFromAttribute`: Serializer. Converts an MLIR attribute into
bytes.
Note: `convertToAttribute` / `convertFromAttribute` are mainly for
[23 lines not shown]
Revert "[mlir][IR] Generalize `DenseElementsAttr` to custom element types" (#183917)
Reverts llvm/llvm-project#183891
Reverting a second time. The build bot failure seems to be
non-deterministic.
[CMake] Use keyword signature in two additional callsites (#183889)
Fix-forward for https://github.com/llvm/llvm-project/pull/183541.
Two callsites to target_link_libraries were not migrated to the
keyword signature.
Signed-off-by: Itay Bookstein <itay.bookstein at nextsilicon.com>
[mlir][VectorToLLVM] Fix crash in VectorInsertOpConversion with dynamic index (#183783)
VectorInsertOpConversion crashes with an assertion failure when
inserting a sub-vector at a dynamic position into a multi-dimensional
vector. The pattern calls getAsIntegers() on the position, which asserts
that all fold results are compile-time constant attributes.
The existing guard (checking llvm::IsaPred<Attribute>) only covered the
case where a scalar is inserted into the innermost dimension (the
extractvalue path). The guard was missing for the insertvalue path when
inserting a sub-vector at a dynamic position into a nested aggregate.
Fix: add the same guard before the llvm.insertvalue creation to return
failure() gracefully when any position index is dynamic, matching the
behavior of VectorExtractOpConversion.
Fixes #177829
[mlir][IR] Generalize `DenseElementsAttr` to custom element types (#183891)
`DenseElementsAttr` supports only a hard-coded list of element types:
`int`, `index`, `float`, `complex`. This commit generalizes the
`DenseElementsAttr` infrastructure: it now supports arbitrary element
types, as long as they implement the new `DenseElementTypeInterface`.
The `DenseElementTypeInterface` has the following helper functions:
- `getDenseElementBitSize`: Query the size of an element in bits. (When
storing an element in memory, each element is padded to a full byte.
This is an existing limitation of the `DenseElementsAttr`; with an
exception for `i1`.)
- `convertToAttribute`: Attribute factory / deserializer. Converts bytes
into an MLIR attribute. The attribute provides the assembly format /
printer for a single element.
- `convertFromAttribute`: Serializer. Converts an MLIR attribute into
bytes.
Note: `convertToAttribute` / `convertFromAttribute` are mainly for
[26 lines not shown]
[VPlan] Materialize UF after unrolling (NFCI).
Move materialization of the symbolic UF directly to unrollByUF. At this
point, unrolling materializes the decision and it is natural to also
materialize the symbolic UF here.
[AArch64][PAC] Emit `!dbg` locations in `*_vfpthunk_` functions (#179688)
The usage of pointers to member functions with Pointer Authentication
requires generation of `*_vfpthunk_` functions. These thunk functions
can be later inlined and optimized by replacing the indirect call
instruction with a direct one and then inlining that function call.
In absence of `!dbg` metadata attached to the original call instruction,
such inlining ultimately results in an assertion "!dbg attachment points
at wrong subprogram for function" in the assertions-enabled builds. By
manually executing `opt` with `-verify-each` option on the LLVM IR
produced by the frontend, an actual issue can be observed: "inlinable
function call in a function with debug info must have a !dbg location"
after the replacement of indirect call instruction with the direct one
takes place.
This commit fixes the issue by attaching artificial `!dbg` locations to
the original call instruction (as well as most other instructions in
`*_vfpthunk_` function) the same way it is done for other
compiler-generated helper functions.
[mlir][affine] Fix crash in linearize_index fold when multi-index is ub.poison (#183816)
`AffineLinearizeIndexOp::fold` guarded the constant-folding path with
`llvm::is_contained(adaptor.getMultiIndex(), nullptr)`, which only
catches operands that have not been evaluated at all. When an operand
folds to `ub.PoisonAttr`, the attribute is non-null so the guard passed,
and the subsequent `cast<IntegerAttr>(indexAttr)` call crashed with an
assertion failure.
Fix by replacing the null-only check with one that requires every
multi-index attribute to be a concrete `IntegerAttr`, returning
`nullptr` for any other attribute (including null and PoisonAttr).
Fixes #178204
[mlir] Fix crash in testNoSkipErasureCallbacks on empty blocks (#183757)
The `noSkipBlockErasure` callback in `testNoSkipErasureCallbacks` called
`block->front().getParentRegion()` to get the parent region of a block.
This dereferences the ilist sentinel node when the block has no
operations, triggering an assertion failure.
Use `block->getParent()` instead, which directly returns the region
containing the block without requiring any operations to be present.
Fixes #183511
[mlir][test-ir-visitors] Fix noSkipBlockErasure crash with block args used across blocks (#183828)
The noSkipBlockErasure callback in TestVisitors.cpp dropped uses of op
results within the same region before erasing a block, but did not drop
uses of the block's own arguments (e.g. function entry block arguments).
When the block was subsequently erased its block arguments were
destroyed while their use-lists were still non-empty, triggering the
assertion in IRObjectWithUseList::~IRObjectWithUseList().
Fix this by also iterating over the block's arguments and dropping any
uses that belong to the same parent region. This mirrors the existing
logic for op result uses and makes the block-erasure walk handle IRs
where function arguments are consumed by ops in sibling blocks.
Also replace `block->front().getParentRegion()` with
`block->getParent()` for robustness (avoids UB when the block has no
ops).
Add a regression test based on the reproducer from
[2 lines not shown]
[mlir][tensor] Fix crash in expand_shape fold with dynamic result type (#183785)
`foldReshapeOp` (in `ReshapeOpsUtils.h`) and `FoldReshapeWithConstant`
(in `TensorOps.cpp`) both tried to create a new `DenseElementsAttr`
constant when folding a reshape op whose operand is a constant. Neither
checked that the result type was statically shaped before doing so, but
`DenseElementsAttr::reshape()` and
`DenseElementsAttr::getFromRawBuffer()` both assert `hasStaticShape()`.
Guard both fold paths with a `hasStaticShape()` check so they return
early when the result type contains a dynamic dimension.
Fixes #177845