[AMDGPU] Introduce asyncmark/wait intrinsics
Asynchronous operations are memory transfers (usually between the global memory
and LDS) that are completed independently at an unspecified scope. A thread that
requests one or more asynchronous transfers can use async marks to track their
completion. The thread waits for each mark to be completed, which indicates that
requests initiated in program order before this mark have also completed.
For now, we implement asyncmark/wait operations on pre-GFX12 architectures that
support "LDS DMA" operations. Future work will extend support to GFX12Plus
architectures that support "true" async operations.
Co-authored-by: Ryan Mitchell ryan.mitchell at amd.com
Fixes: SWDEV-521121
[AMDGPU] Asynchronous loads from global/buffer to LDS on pre-GFX12
The existing "LDS DMA" builtins/intrinsics copy data from global/buffer pointer
to LDS. These are now augmented with their ".async" version, where the compiler
does not automatically track completion. The completion is now tracked using
explicit mark/wait intrinsics, which must be inserted by the user. This makes it
possible to write programs with efficient waits in software pipeline loops. The
program can now wait for only the oldest outstanding operations to finish, while
launching more operations for later use.
This change only contains the new names of the builtins/intrinsics, which
continue to behave exactly like their non-async counterparts. A later change
will implement the actual mark/wait semantics in SIInsertWaitcnts.
Fixes: SWDEV-521121
pcb.h: mark struct pcb to be preserved
There are programs that depend on this structure (e.g. kernel debuggers)
that breaks when the ABI changes.
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D55149
[RISCV] Add SpacemiT X100 base scheduling model (#178189)
SpacemiT X100 is a 4-issue, out-of-order, RVA23 processor. This patch
introduces the base scheduling model for scalar instructions. The
scheduling model for RVV will be added in a future update.
[BOLT][NFC] Stop populating unnecessary samples into MemSamples (#179472)
Currently, many unnecessary samples are populated into MemSamples,
including zero-initialized samples and samples in which the PC address
is not contained in any BinaryFunction. But these samples are totally
skipped during processing and the whole MemSamples vector is cleared
immediately after processing. So, we could just stop populating these
samples into MemSamples, which would reduce maximum resident set size
when processing a large perf.data.
net-im/libnice*: update to 0.1.23
Update to 0.1.23
libnice 0.1.23 (2025-11-26)
===========================
API: Added option NICE_AGENT_OPTION_CLOSE_FORCED to not wait for TURN when
closing asynchronous
Reject invalid remote candidates with priority=0
Add missing mutex in tcp-bsd socket
Add buffer list support to nicesrc
Avoid dropping packing in nicesink, retry instead
Only create a new NiceCandidate if a socket can be opened, as it is
a somewhat costly operation.
Many new tests
Fix leaks
Fix various test flakiness
Adjust dependencies.
[12 lines not shown]
x86: Note that trapframe is used by kernel debuggers
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D55189
[RISCV] Add used callee-saved registers as implicit/implicit-def registers to save/restore call (#180133)
We should add used callee-saved registers as implicit used to save
libcall and as implicit defined to restore libcall. It likes what we did
for CM_PUSH/CM_POPRET. That can help to construct correct dataflow. In
entry bb, save libcall implicitly uses the callee-saved registers which
live in. And in return bb, restore libcall implicitly defines the
callee-saved registers which live out.
emulators/wine-devel: Update 11.1 => 11.2
Changelog:
- More optimizations in PDB loading.
- Support for MSVC constructors in C runtime.
- Easier mechanism for creating version resources.
- Various bug fixes.
https://gitlab.winehq.org/wine/wine/-/releases/wine-11.2
PR: 293040
Makefile.incl1: .WAIT before distribute in etc
In order to make sure that man pages are all installed before we run
makewhatis to generate mandoc.db files, we have long placed etc at the
end of the list of subdirectories being recursed into by the build.
In order to support installworld -jN, a .WAIT was more recently added
here.
With the recent adoption by the release engineering team of parallel
*release* builds (aka 'make release -jN') it is now also necessary to
add the same .WAIT before recursing for the 'distribute' target, as we
otherwise end up with distribution sets containing incomplete mandoc.db
files.
Reviewed by: bdrewery
PR: 289683
MFC after: 3 days
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D53533
[2 lines not shown]