[mlir][amdgpu] Update TDM ops to use the new barrier type, improve docs (#180572)
Now that we have an AMDGPU dialect type for the in-LDS barriers that the
tensor data mover can automatically visit, update the definition of the
tensor descriptor operations to use said types and document the behavior
of the barrier.
[mlir][AMDGPU] Change width of LDS barrier count (#180554)
Whoops, turns out I was off by 1 on how many bits are in the counts and
phases ind these new LDS barriers. This commit fixes this.
Co-authored-by: Claude Opus 4.5 <noreply at anthropic.com>
[MLIR][XeGPU] Fix insert_strided_slice op in subgroup distribution (#180604)
The PR modifies the subgroup distribution pass to only sink
insert_strided_slice operation if it becomes the last op before yield.
It avoids sinking insert_strided_slice multiple times and cause
potential issue in worst case.
[AArch64] Inline asm v0-v31 are scalar when having less than 64-bit capacity (#169930)
If 32-bit (or less) "v0" registers coming from inline asm are treated as
vector ones, codegen might produce incorrect vector<->scalar
conversions. This causes types mismatch assertion failures later during
compile-time. The fix treats 32-bit or less v0-v31 AArch64 registers as
scalar, along with 64-bit ones.
Fixes #153442
[WinEH] Check object unwinding in seh block only when -eha is used (#180108)
After this [PR](https://github.com/llvm/llvm-project/pull/172287), build
errors may occur even when `/EHa` is not enabled(like use `/EHsc`).
While MSVC performs similar checks whenever exceptions are enabled, LLVM
is more prone to generating invalid code where SEH fails to function
correctly when asyn exceptions(`/EHa`) are not used. Valid SEH code
generation is typically only ensured when `/EHa` are enabled. Therefore,
this patch is designed to perform the check only when `/EHa` is used.
Of course, on Windows, if LLVM doesn't use `/EHa` but uses `/EHsc` or
`/EHs` instead the code within the `__except` block may behave
unexpectedly, it unlike msvc. This is consistent with previous LLVM
versions.
[DataLayout] Add a specifier for element-aligned vectors
This adds the "ve" specifier to Data Layout, which says that vectors are
element-aligned by default for a target.
Note that we also remove the default vector specs for 64 and 128 bit
vectors - these match the natural alignment of those vectors, so they
didn't actually have any functional effect.
[offload][flang-rt] Fix NVPTX runtime build (#180530)
During the check for availability of `strerror_r`, the host include file is used. This doesn't matter for AMDGPU since it actually performs the link step during `check_cxx_symbol_exists`. But for NVPTX, due to `-c`, it doesn't link and then incorrectly assumes that the symbol exists.
For now, removing `io-error.cpp` from the list of GPU sources is the most sensible option since it's unused.
Skip cache insertion if we don't have a name
Seen in QE AD domain that is not fully stable or predicable, but
not in other domains in our org or customer sites, it's possible
that AD response for query to resolve SID to name may return
empty string. Because of intervening caching layers in NSS plugin
and winbindd, we don't have an effective way to force AD to keep
trying till it gives us something sane. In this case, we'll just
ignore the entry for cache insertion purposes. The user or
group won't appear in dropdowns, but functional impact will be
limited since admins can still type in the name (hopefully) and
recover at a future point.
14733 loader.efi: faults could try to print out call trace
Reviewed by: Garrett D'Amore <garrett at damore.org>
Reviewed by: Robert Mustacchi <rm+illumos at fingolfin.org>
Approved by: Dan McDonald <danmcd at edgecast.io>
[lld][Hexagon] Fix R_HEX_TPREL_11_X relocation on duplex instructions (#179860)
findMaskR11() was missing handling for duplex instructions. This caused
incorrect encoding when R_HEX_TPREL_11_X relocations were applied to
duplex instructions with large TLS offsets.
For duplex instructions, the immediate bits are located at positions
20-25 (mask 0x03f00000), not in the standard positions used for
non-duplex instructions.
This fix adds the isDuplex() check to findMaskR11() to return the
correct mask for duplex instruction encodings.
devel/py-Js2Py: Apply upstream patch for Python 3.12 support
Co-authored-by: Michael Osipov <michaelo at FreeBSD.org>
PR: 289085
MFH: 2026Q1
(cherry picked from commit 37e1f72f44e412445a2e97bc85e159b218390243)
Updated net/xfr to 0.6.0
v0.6.0
What's New
Congestion Control Selection (--congestion)
Choose your TCP congestion control algorithm per-test:
xfr <host> --congestion bbr - # Compare BBR vs default CUBIC
xfr <host> --congestion reno - # Classic Reno
Works on both client and server sockets. Invalid algorithms are caught early with a helpful error listing what's available on your kernel.
Live TCP_INFO Polling
RTT and cwnd are now reported every interval during tests, not just in the final result. This enables:
- Real-time TCP metrics in the TUI
- Per-interval rtt_us and cwnd in --json-stream and --csv output
[61 lines not shown]
sys/netinet6: switch net.inet6.ip6.use_stableaddr to on by default
This change switches to using RFC 7217 algorithm as the default to
generate SLAAC addresses for IPv6 interfaces configured with
accept_rtadv.
Reviewed by: pouria, glebius, zlei
Approved by: zlei
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D55138