AMDGPU/GlobalISel: Regbanklegalize rules for G_UNMERGE_VALUES
Move G_UNMERGE_VALUES handling to AMDGPURegBankLegalizeRules.cpp.
Fix sgpr S16 unmerge by lowering using shift and using S32.
Previously sgpr S16 unmerge was selected using _lo16 and _hi16 subreg
indexes which are exclusive to vgpr register classes.
For remaing cases we do trivial mapping, assigns same reg bank
to all operands, vgpr or sgpr.
vchiq: fix build with clang 21
When compiling vchiq with clang 21, the following -Werror warning is
produced:
sys/contrib/vchiq/interface/vchiq_arm/vchiq_arm.c:728:27: error: default initialization of an object of type 'VCHIQ_QUEUE_MESSAGE32_T' with const member leaves the object uninitialized [-Werror,-Wdefault-const-init-field-unsafe]
728 | VCHIQ_QUEUE_MESSAGE32_T args32;
| ^
sys/contrib/vchiq/interface/vchiq_arm/vchiq_ioctl.h:151:40: note: member 'elements' declared 'const' here
151 | const /*VCHIQ_ELEMENT_T * */ uint32_t elements;
| ^
While the warning is formally correct, the 'args32' object is
immediately initialized after its declaration. Therefore, suppress the
warning.
MFC after: 3 days
Add an overload of `expandMemSetAsLoop` that takes an optional TTI pointer
This avoids breaking the API for out-of-tree tools like the
SPIRV-LLVM-Translator.
[llvm][utils][release] Remove mention of sub-project source archives (#176348)
These are no longer provided as of llvm 22:
https://discourse.llvm.org/t/llvm-22-1-0-rc1-released/89479
> Please note: since the last release the subproject tarballs have been
> removed and are no longer provided. See RFC: Do "something" with the
> subproject tarballs in the release page for more details.
There are now only llvm-project and llvm-test-suite archives.
[AMDGPU] si-peephole-sdwa: Handle V_PACK_B32_F16_e64 (WIP)
Change si-peephole-sdwa to eliminate V_PACK_B32_F16_e64 instructions
by changing the second operand to write to the upper word of the
destination directly.
[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16
Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication),
if available, and makes it difficult to eliminate the packing of the
results by using SDWA form; previous attempts to deal with the latter
situation in the si-peephole-sdwa pass were unwieldly since it was
necessary to reconstruct the association between the source and target
vectors.
Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom intrinsic lowering to deal
with the v2f16 for the intrinsics introduced in this way.
[AMDGPU] Fix typo in `LowerVGPREncoding` to allow it to hoist past `waitcnt` instructions (#176355)
Fixes a typo which prevented `set_vgpr_msb` to be hoisted past `waitcnt`
instructions.
[LowerMemIntrinsics] Optimize memset lowering
This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.
This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.
I'm planning similar treatment for memset.pattern as a follow-up PR.
For SWDEV-543208.
GlobalISel: Use LibcallLoweringInfo analysis in legalizer (#170328)
This is mostly boilerplate to move various freestanding utility
functions into LegalizerHelper. LibcallLoweringInfo is currently
optional, mostly because threading it through assorted other
uses of LegalizerHelper is more difficult.
I had a lot of trouble getting this to work in the legacy pass
manager with setRequiresCodeGenSCCOrder, and am not happy with the
result. A sub-pass manager is introduced and this is invalidated,
so we're re-computing this unnecessarily.
[mlir][Python] remove stray nb::cast (#176299)
In https://github.com/llvm/llvm-project/pull/155114 we removed
`liveOperations` but forgot this line which was being used to invalidate
operations under a transform root, which currently isn't being used for
anything. So remove.
FYI this led to a subtle double free bug after
https://github.com/llvm/llvm-project/pull/175405:
```python
@test_in_context
def check_builtin():
module = builtin_d.ModuleOp()
with module.context, ir.Location.unknown():
transform_module = builtin_d.Module.create()
transform_module.operation.attributes["transform.with_named_sequence"] = (
ir.UnitAttr.get()
)
[34 lines not shown]
graphics/sdl2_gpu: disable DOCS option due to OOM condition in graphviz
For some reason graphviz now needs more than 20 GB to process one of
the figures in the documentation of this project.
Disable docs to avoid OOM conditions.
MFH: 2026Q1
(cherry picked from commit 1f4db82dbd52c18499fb8f7d222415324877dff3)
cad/spice: fix build on FreeBSD 15.0
The timezone symbol now follows POSIX on FreeBSD 15.0.
Remove spice's own declaration, which seems to have gone unused in any
case.
See also: https://reviews.freebsd.org/D44281
Approved by: portmgr (build fix blanket)
MFH: 2026Q1
(cherry picked from commit 46b4e04137ac569c1bbcd3965c087e785c70bc55)