[MLIR][XeGPU] Add distribution patterns for vector step, shape_cast & broadcast from sg-to-wi (#185960)
This PR adds distribution patterns for vector.step, vector.shape_cast &
vector.broadcast in the new sg-to-wi pass
[clang-format] Fix breaking enum braces when combined with export (#189128)
This fixes #186684.
Also fix (not) breaking variables declared on the same line as the
closing brace.
And adapt whitesmith to that changes.
[MLIR][Python] Add more field specifiers to Python-defined operations (#188064)
This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).
```python
def result(
*,
infer_type: bool = False,
default_factory: Optional[Callable[[], Any]] = None,
kw_only: bool = False,
) -> Any: ...
def operand(
[43 lines not shown]
[MLIR] Fix crash in test-bytecode-roundtrip when test dialect is absent (#189163)
When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.
When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.
A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.
Fixes #128321
[2 lines not shown]
[MLIR][Vector] Fix crash in foldDenseElementsAttrDestInsertOp on poison index (#188508)
When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into
the static position of a vector.insert op,
foldDenseElementsAttrDestInsertOp would proceed to call
calculateInsertPosition, which returned -1. The subsequent iterator
arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing
an assertion in DenseElementsAttr::get.
Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any
static position equals kPoisonIndex, consistent with how
InsertChainFullyInitialized already guards this case.
Fixes #188404
Assisted-by: Claude Code
[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface (#187864)
To simplify the output of the reduction-tree pass, this PR introduces
the eraseRedundantBlocksInRegion. For regions containing multiple
execution paths, this functionality selects the shortest 'interesting'
path. Additionally, this PR adds the getSuccessorForwardOperands API to
BranchOpInterface. This allows us to extract the ForwardOperands for a
specific path chosen from multiple alternatives, enabling the creation
of a cf.br operation for the redirected jump.
[clang][bytecode] Skip rvalue subobject adjustments for global temporaries (#189044)
We only did this for local variables but were were missing it for
globals.
[Driver][HIP] Fix bundled -S emitting bitcode instead of assembly for device (#189140)
[Driver][HIP] Fix bundled -S emitting bitcode instead of assembly for
device
PR #188262 added support for bundling HIP -S output under the new
offload driver, but the device backend still entered the
bitcode-emitting path in ConstructPhaseAction. The condition at the
Backend phase checked for the new offload driver and directed device
code to emit TY_LLVM_BC, without excluding the -S case. This caused
the device section in the bundled .s to contain LLVM bitcode instead
of textual AMDGPU assembly.
This broke the HIP UT CheckCodeObjAttr test which greps
copyKernel.s for "uniform_work_group_size" — a string that only
appears in textual assembly, not in bitcode.
Fix by excluding -S (without -emit-llvm) from the new-driver
bitcode path, so the device backend falls through to emit TY_PP_Asm
[3 lines not shown]
[XeVM] Fix the cache-control metadata string generation. (#187591)
Previously, it generated extra `single` quote marks around the outer
braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not
expect that. It expects `{6442:\220,1\22}`.
AMDGPU: Make VarIndex WeakTrackingVH in AMDGPUPromoteAlloca (#188921)
The test used to look all good, but actually not. The WeakVH just make
itself null after the pointed value being replaced. So a zero value was
used because VarIndex become null. The test checks looks all good.
Actually only the WeakTrackingVH have the ability to be updated to new
value.
Change the test slightly to make that using zero index is wrong.
[Clang] remove redundant uses of dyn_cast (NFC) (#189106)
This removes dyn_cast invocations where the argument is already of the
target type (including through subtyping). This was created by adding a
static assert in dyn_cast and letting an LLM iterate until the code base
compiled. I then went through each example and cleaned it up. This does
not commit the static assert in dyn_cast, because it would prevent a lot
of uses in templated code. To prevent backsliding we should instead add
an LLVM aware version of
https://clang.llvm.org/extra/clang-tidy/checks/readability/redundant-casting.html
(or expand the existing one).