[MLIR][XeGPU] Fix layout inference issues blocking MXFP_GEMM test (#196243)
This branch fixes layout inference issues in XeGPU passes that were
blocking MXFP (microscaled floating point) GEMM workloads:
- Fix bitcast layout adjustment to use result shape instead of source
shape. The setupBitCastResultLayout function were incorrectly bounding
the layout adjustment loop against the source shape. Added tests.
- Fix blocking pass to drop inst_data from anchor operations. Operations
whose shape already matches inst_data don't get unrolled, so their
layout attributes retained stale inst_data that broke downstream passes.
Now inst_data is unconditionally stripped from all op attributes after
blocking.
- Propagate layout to both results of vector.deinterleave. The layout
recovery pass was only setting the layout on result 0, leaving result 1
without a layout.
Test plan
[9 lines not shown]
Merge tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Allow for multiple opens of /sys/fs/selinux/policy
Prevent a single process from blocking others from reading the
SELinux policy loaded in the kernel. This does have the side effect
of potentially allowing userspace to trigger additional kernel memory
allocations as part of the open/read operation, but this is mitigated
by requiring the SELinux security/read_policy permission.
- Reduce the critical sections where the SELinux policy mutex is held
This includes the patch to the policy loader code where we move the
permission checks and an allocation outside the mutex as well as the
the patch to checkreqprot which drops the code/lock entirely.
While the checkreqprot code had effectively been dropped in an
[24 lines not shown]
[NFC][AMDGPU] Use a worklist and remember results in AMDGPUAttributor
This was a recursive function with a Map to cache things that was never filled.
Now it's a worklist and the map is actually used.
Co-authored-by: Johannes Doerfert <johannes at jdoerfert.de>
[CodeGen][RISCV] Inline stack probes immediately after `allocateStack` in `eliminateCallFramePseudoInstr` (#195456)
This PR adds a call to `inlineStackProbe` immediately after
`allocateStack` in `eliminateCallFramePseudoInstr`. This allows code
generation for stack probe pseudoinstructions in non-entry BBs.
Fixes #195454.
[SLP]Bail out on non-schedulable expanded binop with stale operand deps
In tryScheduleBundle's DoesNotRequireScheduling path, an expanded binop
(shl X, 1 modeled as add X, X) doubles the dependency count of the
duplicated operand. If the operand has a
single IR use yet its ScheduleData already has Dependencies populated
by an earlier calculation that did not see the expanded duplicate use,
double decrement still exceeds calculateDependencies' single increment
and UnscheduledDeps goes negative.
Fixes #196281.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/196449
[Clang][HLSL] Fix -Wunused-variable (#196445)
LookupSucceeded is only used in an assertion. Mark it [[maybe_unused]]
so we do not get -Wunused-variable in non-assertions builds.
ZTS: redundancy_draid_spare1
Preserve the 'zpool status' output used to calculate the number of
checksum errors so it can be logged on failure. Several instances have
been observed in the CI where cksum was set to a non-zero value, yet a
subsequent run of 'zpool status' on failure showed no checksum errors.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18500
[AMDGPU][True16] relax d16-write-vgpr32 condition (#194477)
Patch https://github.com/llvm/llvm-project/pull/157795 work around a D16
load HW issue.
We found the condition of this workaround could be relaxed for
instructions from same order groups. Downstream testing looks ok.
[DebugInfo] Remove old decls when converting DI (#194964)
We were trying to remove declarations of old debug intrinsics whenever
printing modules or writing them to file. This is no longer necessary as
we use the new-style debug values exclusively now, other than when a
target pass specifically converts back to the old style. If a target
pass does that, removing the intrinsics is not right as the intrinsics'
users will still linger.
This change should be NFC except for the experimental DirectX target
where we do exactly that.
Fixes #194884