[HLSL][Matrix] Make matrix truncation respect default matrix memory layout (#184280)
Fixes #183127 and #184371
This PR makes the matrix truncation cast implementation use the new
matrix flattened index helper functions introduced by #182904 so that it
reads elements from the source matrix using the default matrix memory
layout instead of always assuming column-major order.
This PR also fixes a bug where matrix truncation truncated the wrong
elements.
Assisted-by: claude-opus-4.6
NAS-140179 / 27.0.0-BETA.1 / Introduce typed event source (#18396)
## Context
Introduce `TypedEventSource` which to the `run` body gives access to the
pydantic model itself which should be used so we can statically type
check properly arguments.
17900 svc/configd: format issue while printing pid_t
Reviewed by: Gordon Ross <Gordon.W.Ross at gmail.com>
Approved by: Dan McDonald <danmcd at edgecast.io>
[CIR] Fix operator-precedence bugs in assert conditions
Due to && binding tighter than ||, asserts of the form
assert(A || B && "msg") always pass when A is true. Add
parentheses so the string message is properly attached:
assert((A || B) && "msg").
[AMDGPU][SIInsertWaitcnts][NFC] Simplify logic in GFX12Plus::applyPreexistingWaitcnts (#184925)
The loop is collecting the first instruction of each waitcnt kind and is
erasing the rest, with the exception of DEPCTR which needs more checks.
The existing code was factoring out the instruction deletion and the
setting of the collected instruction variables. But the special handling
for DEPCTR and the in-loop deletion of `S_WAITCNT_lds_direct` was just
complicating the logic.
[libcxx] Add `__split_buffer::__swap_layouts` (#180102)
This commit simplifies the cumbersome process of swapping the respective
layout members for `__split_buffer` and `vector`.
[flang][cuda] Add hasManagedOrUnifedSymbols attribute to cuf.data_transfer op (#185106)
Add an attribute to signal the presence of managed or unified symbols in
the data transfer. In some case, the presence of such symbols require to
insert synchronization. Adding the attribute in the op during lowering
facilitate the recognition of such data transfer.
[CIR] Fix convertSideEffectForCall header/definition signature mismatch
Add missing bool &noReturn parameter to the declaration in
LowerToLLVM.h to match the definition in LowerToLLVM.cpp.
[CIR] Change CmpOp assembly format to use bare keyword style
Update the assembly format of cir.cmp from the parenthesized style
cir.cmp(gt, %a, %b) : !s32i, !cir.bool
to the bare keyword style used by other CIR ops like cir.cast:
cir.cmp gt %a, %b : !s32i
The result type (!cir.bool) is now automatically inferred as it is
always cir::BoolType.
NAS-140176 / 27.0.0-BETA.1 / Fix GenericCRUDService query overload (#18394)
## Context
Fix query overload of GenericCRUDService as if count/get are not set, we
return a list of entries.
Merge tag 'pci-v7.0-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Initialize msi_addr_mask for OF-created PCI devices to fix sparc and
powerpc probe regressions (Nilay Shroff)
- Orphan the Altera PCIe controller driver (Dave Hansen)
* tag 'pci-v7.0-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Orphan Altera PCIe controller driver
sparc/PCI: Initialize msi_addr_mask for OF-created PCI devices
powerpc/pci: Initialize msi_addr_mask for OF-created PCI devices
[CIR] Change CmpOp assembly format to use bare keyword style
Update the assembly format of cir.cmp from the parenthesized style
cir.cmp(gt, %a, %b) : !s32i, !cir.bool
to the bare keyword style used by other CIR ops like cir.cast:
cir.cmp gt %a, %b : !s32i
The result type (!cir.bool) is now automatically inferred as it is
always cir::BoolType.
sys: Don't pass RF_ALLOCATED to bus_alloc_resource*
This is a nop as eventually these flags are passed to rman_reserve_resource
which unconditionally sets RF_ALLOCATED in the new flags for a region.
However, it's really a layering violation to use RF_ALLOCATED in relation
to struct resource objects outside of subr_rman.c as subr_rman.c uses
this flag to manage it's internal tracking of allocated vs free regions.
In addition, don't document this as a valid flag in the manual. I
think the intention here was that if a caller didn't want to pass
RF_ACTIVE or RF_SHAREABLE, they could pass RF_ALLOCATED instead of 0,
but given the layering violation, I think it's best to just pass 0
instead in that case.
NB: The bhnd bus uses RF_ALLOCATED (along with RF_ACTIVE) in a
separate API to manage resource regions that are not struct resource
objects (but a separate wrapper object). It would perhaps be cleaner
if the chipc_retain_region and chipc_release_region functions used
their own flag constants instead of reusing the rman(9) flags.
[3 lines not shown]
[HLSL][DirectX] Implement HLSL `mul` function and DXIL lowering of `llvm.matrix.multiply` (#184882)
Fixes #99138
- Defines a `__builtin_hlsl_mul` clang builtin in `Builtins.td`.
- Links the `__builtin_hlsl_mul` clang builtin with
`hlsl_alias_intrinsics.h` under the name `mul` for matrix cases
- Implement scalar and vector elementwise multiplication cases of the
`mul` function in `hlsl_intrinsics.h` and `hlsl_intrinsic_helpers.h`
- Adds sema for `__builtin_hlsl_mul` to `CheckBuiltinFunctionCall` in
`SemaHLSL.cpp`
- Adds codegen for `__builtin_hlsl_mul` to `EmitHLSLBuiltinExpr` in
`CGHLSLBuiltins.cpp`
- Vector-vector cases lower to `dot` (except double vectors, which
expands to scalar multiply-adds).
- Matrix-matrix, matrix-vector, and vector-matrix multiplication lower
to the `llvm.matrix.multiply` intrinsic
- Adds codegen tests to `clang/test/CodeGenHLSL/builtins/mul.hlsl`
- Adds sema tests to `clang/test/SemaHLSL/BuiltIns/mul-errors.hlsl`
[13 lines not shown]
[mlir][ODS] Fix notorious double-space bug in op printers (#184253)
When an op's assembly format prints an attribute via
`printStrippedAttrOrType`, two independent space-emission mechanisms
would fire: the op format generator emits a space before each argument,
and the attribute's generated `print` method also emits a leading space
(`shouldEmitSpace` initialized to true). This caused double spaces like
`gpu.shuffle xor`.
The usual workaround for this was to add double backticks to consume the
leading space.
Fixed by removing the leading space from generated attr/type `print()`
methods and compensating in the print dispatcher by conditionally adding
a space between the mnemonic and `print` call when the format starts
with a name or keyword rather than punctuation.
Also remove some workarounds for the double-spacing in op formats and
fix tests that now don't have leading spaces.
Assisted-by: claude
Merge tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
There is one mm fix in here for a HMM livelock triggered by the xe
driver tests. Otherwise it's a pretty wide range of fixes across the
board, ttm UAF regression fix, amdgpu fixes, nouveau doesn't crash my
laptop anymore fix, and a fair bit of misc.
Seems about right for rc3.
mm:
- mm: Fix a hmm_range_fault() livelock / starvation problem
pagemap:
- Revert "drm/pagemap: Disable device-to-device migration"
ttm:
[72 lines not shown]