[MLIR][XeGPU] XeGPU DpasMx Op Definition adds Layout Support (#194117)
This PR extends the DpasMx operation to support MXFP (microscaling
floating point) matrix multiply with separate scale factor layouts.
1. Op Definition
Added layout_a_scale and layout_b_scale attributes to DpasMx op
Removed AllElementTypesMatch<["a", "b"]> trait to allow different types
for A/B with scales
2. Layout Infrastructure
setupDpasMxLayout(): Creates anchor layouts for all 5 operands (A, B,
C/D, scale_a, scale_b)
Derives scale layouts from parent matrix layouts by dividing innermost
dimension
Supports all layout kinds: Subgroup, InstData, Lane
Fix a bug in getupDpasSubgroupLayouts(): sg_data of A/B matrix should
keep the full K dimension.
3. Layout Propagation
[7 lines not shown]
Make xarray cyclic start looking for a free id at the position specified
by the next argument and stop after wrapping back to that position.
Previously looking for a free id started at the beginning of the
allocation range and stopped at the end, ignoring the next argument.
Currently xarray cyclic id allocations are only used by the GuC code in
inteldrm. In 6.18.25 drm, the amdgpu PASID allocation changes from
using cyclic idr to cyclic xarray.
[CIR] Implement PredefinedExpr in aggregate emitter and add consteval… (#194484)
… aggregate test
Handle PredefinedExpr by delegating to emitAggLoadOfLValue, removing the
NYI fallback. Also add a test for ConstantExpr aggregate emission
(consteval functions returning structs), which was already implemented
but lacked test coverage.
This unblocks ~206 libcxx test failures that involve aggregate
ConstantExpr and PredefinedExpr.
Note on LLVM IR divergence (will be addressed in follow-up PRs): For
consteval functions returning aggregates, CIR currently emits a global
constant + cir.copy that lowers to llvm.memcpy from the global, while
OGCG decomposes the constant into per-field stores. The added CIR / LLVM
/ OGCG CHECK lines in consteval-aggregate.cpp document this difference.
Convergence will come from a follow-up that decomposes the consteval
aggregate stores into per-field stores in LoweringPrepare (and related
GEP-index handling for padded structs).
[RISCV] Improve getInterleavedMemoryOpCost for interleave groups with tail gaps. (#192074)
For interleaved access groups where gaps are only at the tail (i.e.
members are contiguous starting from index 0 but do not fill the entire
factor), the interleaved memory access pass can lower them to
vlsseg/vssseg intrinsics with NF equal to the number of group members
rather than the factor after #151612 and #154647.
Previously these groups fell through to the generic fixed-vector shuffle
cost model. This patch adds a dedicated cost path that checks legality
and estimates appropriate cost for them.
TODO: Support scalable vector type.
Fix #151497
[CIR] Lower constant NTTP objects (#194496)
Like my previous patch, this just stores an NTTP object as a global
(using the same code, with 1 level of indrection slipped off), and
initializes it as a const. This patch also fleshes out the
CIRGenExprConstant.cpp area, leaving just 2 'NYI's in the area, 1 of
which is the MSGuidAttr again.
[clang][modules-driver] Fix failing import-std regression test (#194502)
See
https://github.com/llvm/llvm-project/pull/194475#issuecomment-4331347690.
This constrains the test to not run on aarch64, where it fails on
`clang-aarch64-quick` and `llvm-clang-aarch64-darwin` builders.
The failing builders don't show any output, and the test will be
re-enabled for aarch64 in a later follow-up.
Co-authored-by: Naveen Seth Hanig <naveen.hanig at oulook.com>
[DataLayout] Add null pointer value infrastructure
Add support for specifying the null pointer bit representation per address space
in DataLayout via new pointer spec flags:
- 'z': null pointer is all-zeros
- 'o': null pointer is all-ones
When neither flag is present, the address space inherits the default set by the
new 'N<null-value>' top-level specifier ('Nz' or 'No'). If that is also absent,
the null pointer value is zero.
No target DataLayout strings are updated in this change. This is pure
infrastructure for a future ConstantPointerNull semantic change to support
targets with non-zero null pointers (e.g. AMDGPU).
Reland "[clang][modules-driver] Add support for C++ named modules and import std" (2nd attempt) (#194475)
This reverts #193857 and relands #193312.
This adds basic support for explicit C++ named module builds, managed
natively by the Clang driver, including support for use of the Standard
library modules. This follows #187606, which adds the same for Clang
modules.
Current limitations:
- Standard library modules are still compiled to object files instead of
using the provided shared library. (This will be addressed in a
follow-up soon.)
- Caching is not supported yet (but likely to be added during the
upcoming GSoC cycle).
- Importing C++ standard library modules into Clang modules is not
supported (and not expected in the near term).
RFC:
https://discourse.llvm.org/t/rfc-modules-support-simple-c-20-modules-use-from-the-clang-driver-without-a-build-system
Merge tag 'cgroup-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix UAF race in psi pressure_write() against cgroup file release by
extending cgroup_mutex coverage and ordering of->priv access after
cgroup_kn_lock_live()
- Fix integer overflow in rdmacg_try_charge() when usage equals INT_MAX
by performing the increment in s64
- Fix asymmetric DL bandwidth accounting on cpuset attach rollback by
recording the CPU used by dl_bw_alloc() so cancel_attach() returns
the reservation to the same root domain
- Fix nr_dying_subsys_* race that briefly showed 0 in cgroup.stat after
rmdir by incrementing from kill_css() instead of offline_css()
- Typo fix in cgroup-v2 documentation
[7 lines not shown]
[CIR] Handle DeclRefExpr's to NTTP Objects (#194482)
NTTP objects are represented as globals so that you can refer to
them/address of them/etc, but most access to them should result in
constant expressions. This patch implements the creation of these
globals, and allows compelation to continue.
This should fix up the last DeclRefExpr LValue that appears other than
MSGuids and named global registers, both of which are specific to
individual attributes.
Merge tag 'fs_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs and udf fixes from Jan Kara:
"Several isofs and udf fixes"
* tag 'fs_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
docs: isofs: replace dead ECMA-119 FTP link
udf: reject descriptors with oversized CRC length
isofs: use QSTR_LEN() in isofs_cmp
isofs: validate block number from NFS file handle in isofs_export_iget
isofs: validate Rock Ridge CE continuation extent against volume size
[InlineSpiller] Fix live-range update in hoisting within bb (#193880)
The InlineSpiller tries to shorten the live-ranges used when storing a
value that is defined by a sibling register by performing the following
transformation:
```
a = copy b
store a
```
=>
```
store b
```
That is, it eliminates the copy and store the original value at the copy
location.
As far as `b`'s live-range is concerned, this transformation is neutral
as long as the store is inserted in place of the copy being removed.
[37 lines not shown]
Merge tag 'fsnotify_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Three fixes for fsnotify / fanotify"
* tag 'fsnotify_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix inode reference leak in fsnotify_recalc_mask()
fanotify: Fix spelling mistake "enforecement" -> "enforcement"
fanotify: fix false positive on permission events
ports-mgmt/poudriere-dsh2dsh: Update 3.4.99.20260415 => 3.4.99.20260426
Upstream changes:
- options: Improve performance by loading ports_env.
- bulk: Revert not refetching on checksum failure (for distinfo-expected rerolled distfile cases).
- testport: do not check the parent directory of a port does not have Mk.
- Fix documented default for `CHECK_CHANGED_OPTIONS`.
- sh: Add simple command redirect vfork support from Jilles.
PR: 294829
Sponsored by: UNIS Labs
Merge tag 'for-7.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- space reservation fixes:
- correctly undo 'may_use' accounting for remap tree
- avoid double decrement of 'may_use' when submitting async io
- actually enable the shutdown ioctl callback (not just the superblock
ops)
- raid stripe tree fixes when deleting extents
- add missing error handling
- fix various incorrect values set
- fix transaction state when removing a directory, possibly leading to
EIO during log replay
- additional b-tree node key checks during metadata readahead
[19 lines not shown]
Merge tag 'for-7.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mikulas Patocka:
- fix metadata corruption in dm-thin
* tag 'for-7.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-thin: fix metadata refcount underflow