[MLIR][XeGPU] Extend op definitions to support 3D+: load_nd, store_nd, prefetch_nd (#199811)
**Summary**
Extend xegpu.load_nd, xegpu.store_nd, and xegpu.prefetch_nd operations
to support 3D and higher-dimensional tensor descriptors with batch
dimensions, enabling batched memory operations for workloads like [4, 8,
16] tensor loads/stores.
**Changes**
- Verifiers: Removed rank > 2 checks in LoadNdOp::verify() and
StoreNdOp::verify() to allow 3D+ tensor descriptors
- Documentation: Added comprehensive documentation explaining: Tensor
descriptors can be 1D, 2D, 3D, or higher dimensional; Batch dimensions
(leading dimensions) are unrolled to unit dimensions during lowering;
Operations execute at 2D granularity at subgroup level to match 2D block
IO hardware; Examples of 3D operations
- Tests: Added unit tests for 3D operations (load_nd_3d, store_nd_3d,
prefetch_nd_3d)
[2 lines not shown]
pci: bcm2838: cleanup on attach failure to fix devmatch panic
Specifically on the RPi CM4, we currently don't set the controller up
right and it never moves into the ready state (we don't observe the link
active bit). Failure to cleanup here actually results in a panic not
long after, due to a use-after-free in the rman bits. Further down in
pci_host_generic, we have some rman stashed in the softc that are
initialized and placed onto the rman tailq, then the softc is later
freed without an rman_fini() to pull them off of the tailq properly.
Note that PCIe on this board won't come up at boot without something
plugged in, so it currently can't be booted with an empty slot with the
intent to hotplug a supported card. Some issues with controller startup
have been observed with Broadcom NICs in the wild, but no problems have
been observed with other NICs and a variety of different PCIe cards.
Shout-out to Vince <git at darkain.com> for the extensive debugging and
analysis to arrive at this conclusion.
[2 lines not shown]
pci: pci_host_generic: provide cleanup methods outside of detach
If device_attach() fails, we're expected to actually cleanup after
ourselves because device_detach() will not be called. Factor out the
cleanup bits that don't rely on attach having actually succeeded so
that we can cleanup properly in bcm2838_pci.
Reviewed by: andrew, imp
Differential Revision: https://reviews.freebsd.org/D56896
kern: ofw: provide ofw_bus_destroy_iinfo to teardown interrupt-map
For symmetry with ofw_bus_setup_iinfo, the next commits will use it to
properly cleanup on failure in bcm2838_pci.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D56895
[LoopInterchange] Check all inner-exit LCSSA PHIs (#200860)
areInnerLoopExitPHIsSupported() returned true as soon as it saw the
reduction LCSSA PHI, skipping the user-check for any later LCSSA PHIs.
If one had a non-PHI user, legality wrongly succeeded and the transform
hit a cast<PHINode> assertion. Use continue so the remaining PHIs are
still validated.
Fixes #200811.
add signature malleability and pubkey validity checks to ed25519
verification (SSH doesn't depend on these properties)
Pointed out by Soatok Dreamseeker
Add an explicit-seed variant of the keygen function.
feedback / "looks fine" tb@
[CodeGen][AMDGPU] Remove premature empty subrange elimination (#201263)
This commit removes a call to `removeEmptySubRanges` inside
`SplitEditor::rewriteAssigned` which removes empty subranges that may be
expected at a later stage. The empty subranges are eliminated by a later
call to `removeEmptySubRanges`.
Fixes https://github.com/llvm/llvm-project/issues/199337.
---------
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
[ORC] Replace ExecutorSymbolDef with ExecutorAddr in remote lookup. (#201492)
Update DylibManager and associated interfaces to return ExecutorAddrs
for remote symbols, rather than ExecutorSymbolDefs. No clients were
using the flags component of ExecutorSymbolDef, and this brings the
SimpleExecutorDylibManager implementation in OrcTargetProcess into
closer alignment with the NativeDylibManager implementation in the new
ORC runtime.
Update to version 2.1.1
2026/03/04: Version 2.1.1
Patch release.
Updated external libraries: JPEG 10.0, PNG 1.6.48, TIFF 4.7.1, ZLIB 1.3.2.
Fixed FLIR and RAW parser to work correctly on big-endian systems.
2025/06/22: Version 2.1.0
Maintenance release.
Updated external libraries: PNG 1.6.48.
Improved RAW image handler to handle all data types correctly.
Fixed bug compiling with MSYS2/Clang64.
Fix metadirective loop variant lowering
Preserve the associated DO evaluation when a dynamic metadirective can
select either a loop-associated directive or a standalone fallback, so
the fallback still lowers the original loop body.
Scope temporary loop-IV data-sharing attributes to the selected variant.
Use the selected variant's collapse clause to determine how many loop IVs
to mark, avoiding DSA state leaking between alternatives.
[flang][OpenMP] Support loop-associated metadirective variants (part 3)
Enable metadirective lowering for loop-associated variants such as
`do`, `simd`, `parallel do`, and `do simd`.
When a metadirective resolves to a loop-associated directive, the
sibling DO evaluation is spliced into the metadirective's evaluation
list so existing loop lowering finds it. Loop IV data-sharing
attributes are marked at lowering time since semantic analysis cannot
know which variant will be selected. The DataSharingProcessor is also
extended to handle spliced evaluations.
This patch is part of the feature work for #188820 and stacked on top
of #194424.
Assisted with copilot and GPT-5.4
[flang][OpenMP] Support lowering of metadirective (part 2) (#194424)
Lower non-constant `user={condition(expr)}` selectors in OpenMP
metadirectives to a runtime `fir.if` / `else` selection cascade.
Dynamic user conditions are handled in two separate phases:
- Static applicability uses only selector traits that are known at
compile time.
- Guarded ranking preserves selector specificity and dynamic-condition
scores for the path where the runtime condition evaluates to true.
Lowering first filters candidates using compile-time selector traits,
then orders the remaining candidates with the normal OpenMP variant
ranking rules. If the selected candidate has a non-constant user
condition, that condition is emitted as a `fir.if` guard. When the
condition evaluates to false, the `else` branch continues selection
among the remaining candidates.
[57 lines not shown]
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
amd64/machdep.c: explicitly include sys/uio.h
Some kernel configurations result in struct uio being only
forward-declared.
This is direct commit to stable/15.
Sponsored by: The FreeBSD Foundation
[MemProf] Change default of memprof-icp-noinline-threshold to 0 (#201474)
This is no longer needed after PR172502 added support to identify
indirect callees from inlined frames.
[OpenMP] Use ext linkage for kernels handles and globals handles keep… (#200964)
… linkage
Host handles are now emmitted with external linkage to clash if two
kernels with the same name are registered. This could have happen right
now and silently corrupt the program, but it can happen more easily once
we allow users to name their kernels.
In the same patch we make global variable handles retain the linkage of
the global variable, forcing clashes for external ones and continue to
support weak use cases.
---------
Co-authored-by: Shilei Tian <i at tianshilei.me>
[clang-tidy][readability] Ignore macros in function-size check (#199549)
This patch adds an IgnoreMacros option to the readability-function-size
check.
Fixes https://github.com/llvm/llvm-project/issues/112835
sysutils/rubygem-choria-mcorpc-support: Fix with latest Ruby
While here remove explicit dependency on sysutils/choria.
This gem is now a run-time dependency of OpenBolt, but choria itself is
not required for using OpenBolt. So do not install choria as a
dependency of this gem automatically to avoid downloading large
dependencies when unneeded.
With hat: puppet