kernel - Tune zalloc/zfree a bit more
* Adjust the hysteresis to improve margins and use opportunistic
spin locks more often. The desired cache levels will be adjusted
in 10% chunks opportunistically and not back-off into blocking spin
locks until cache levels are 40% off desired targets.
* Increase zmax_pcpu by 2x. Allow more entries to be cached in the
per-cpu caches, reducing the need to drop into the per-zone pool
(which requires the spin lock).
vkernel - Vastly improve ve caching
* Improve per-host-LWP ve caching, vastly reducing primary vkp token
contention under heavy multi-core loads (e.g. build-all). The
per-LWP cache has been increased from one to four entries and
is now tracked on all operations.
Remaining host-side contention is on the vkernel's kernel map, and
there isn't a whole lot I can do about that for now.
* Release the primary vkp token across the pmap and vm_map deletion
when destroying a context to reduce contention with vkernel
operations on other cpus.
* Changes should significantly vkernel improve concurrent syscall
and fault performance in multi-core setups.
vkernel: Restore MAP_VPAGETABLE support with COW/VPTE fix (2)
* Retain the wiring of vkernel-related pages but remove
the PG_VPTMAPPED and replace its functionality. The
main problem with PG_VPTMAPPED is that it could not discern
between mappings that had to be wired, and caching/mappings
that might still be present that are not wired. The pages
would remain wired until the underlying VM object is completely
destroyed, which is undesirable.
* Instead, we actually bump vm_page->wire_count. This allows the
related pages to become pageable again after the vkernel has exited
(after all pmaps that wire them are gone), which means we aren't as
dependent on the file fd/unlink sequence.
* Recode vm_page->wire_count and VM page wiring in general. This
impacts the whole kernel, not just vkernel support. The following
features:
[17 lines not shown]
kernel - Vastly improve zalloc/zfree
* Vastly improve zalloc/zfree by shifting items from the zone pool
to the zone's pcpu pool or back again opportunistically, and in
smaller chunks.
The opportunistic mechanism uses spin_trylock(), and so does not
contend at all. The regular spin_lock() is onl used when the counts
exceed the opportunistic limits.
* Should improve regular kernel performance under heavy loads.
* Vastly improves vkernel performance under multi-cpu loads due to pvzone
operation. However, note that the host still has strong contention on
the vkp->token that we need to address.
vkernel: Restore MAP_VPAGETABLE support with COW/VPTE fix
Re-implement MAP_VPAGETABLE support which was removed in commit
4d4f84f5f26bf5e9fe4d0761b34a5f1a3784a16f. This enables vkernel
functionality - allowing a full kernel to run in userspace.
Changes:
- Re-enable VM_MAPTYPE_VPAGETABLE in sys/vm/vm.h
- Restore vm_fault_vpagetable() for VPTE translation
- Add PG_VPTMAPPED flag to track pages mapped via VPAGETABLE
- Handle MADV_SETMAP/MADV_INVAL for vkernel pmap management
- Fix COW/VPTE synchronization: set FW_DIDCOW for page-level COW
so pmap_enter() updates VPTEs after copy-on-write
- Use anonymous memory (unlinked file) for vkernel memory backing
to ensure proper page reclaim when vkernel exits
- Fix vkernel build issues (cpu_feature2, COWF_PREFAULT_* constants)
- Skip callout address validation for vkernels (_KERNEL_VIRTUAL)
NOTE: This code was almost fully generated by Claude Opus 4.5
virtio_blk - Implement multiqueue support.
* For now, this code uses at most as many virtqueues as cpu cores and
interrupts are available. The number of virtqueues can also be capped via
"hw.vtblk.max_queues" and "hw.vtblk.X.max_queues" tunables (also useful
for benchmarking different scenarios).
* Since virtio interrupts are currently tied together with the per-queue
serializer lock, a fixed 1:1 mapping between interrupts and virtqueues is
used for now, for simplicity. To support multiple virtqueues to share a
single interrupt properly, it's going to be far more elegant if we have a
separate serializer for each virtqueue, and not just for each interrupt.
virtio_blk - Clean up detach code, to work better for multiqueue.
* Specifically this uses the fact that vtblk_flags is only ever modified in
vtblk_attach(), vtblk_detach(), vtblk_suspend(), and vtblk_resume(). So
we don't need to worry about taking all the serializers when setting the
DETACH flag.
* Additionally the virtqueue_disable_intr() call from vtblk_stop() was
completely superfluous, since the virtio_teardown_intr() call earlier
already guarantees that none of the virtqueue interrupt handlers can be
active anymore.
kernel - Get rid of ancient unused 256 non-cloned pty devices.
* These haven't really been used in more than a decade, I think.
* The only binary that still has code for these is rfcomm_sppd(1), however
that's part of the bluetooth stack, and hence isn't useable anyways.
This should be easy to modernize, to use a cloned pty device instead.
* Updating manpages a bit, to refer to the "modern" cloned pty devices.
if_mtw - Port from FreeBSD, with bugfixes, and parts re-ported from OpenBSD.
* Reverted some recent changes in FreeBSD for 80211 API changes on their side:
* reverted ratectl API use in ieee80211_ratectl_tx_update()
* crypto API change (git 5431dafdb9659fb578f)
* seqno offload (git cce278510a820785d88)
* ni->ni_txrate references use (git 7067450010931479f8)
* Re-ported the firmware loading code from OpenBSD. Unfortunately firmware
loading still is not reliable. Sometimes it appears to be consistentl
working, and then after a reboot it repeatedly fails with the same code.
* Added some error return-codes back that OpenBSD's code has, but got removed
in the FreeBSD code. This makes it behave a bit better when it fails.
* Fixed the aggregated RX buffer handling, to split up the mbuf correctly,
using the m_copym method, based on the corresponding code in if_iwm(4).
* Remove the small pieces of #ifdef IEEE80211_SUPPORT_SUPERG code.
[5 lines not shown]
import openresolv-3.17.3 with the following changes:
resolvconf: Add a function to quote and escape input for eval
resolvconf: quote on printf rather than on value
Import openresolv-3.17.1 with the following changes:
* libc: add toggle resolv_conf_restore, defaulting to YES
* resolvconf: Fix -D
* resolvconf: Don't warn when we have no entries to list for *
* resolvconf: -I now inits subscribers after clearing state
* resolvconf: remember if any subscriber errored
* libc: Don't update resolv.conf on signature mismatch
* resolvconf: Single quote parsed values from resolv.conf
* resolvconf: -L now outputs fully processed resolv.conf files
ahci - Properly check and set SATA capabilities and features for ALPM.
- This should now only enable device-initiated and/or host-initiated link
power-management when it is supported by both controller and disk device.
- Also this now allows for device-initiated power-management to be enabled
with AHCI controllers that don't support automatic host-initiated
power-management.
- In addition, this adds support for automatic promotion of "partial" state
to "slumber" state (i.e. without needing to go through "active" state).
- The kernel console output now explicitly tells when it's enabling DIPM
(device-initiated power management) and HIPM (host-initiated power
management) respectively.
wlan - Remove NULL free in fallback "none" ratectl code.
This avoids a kernel panic when detaching a wlan interface that was created
with the "none" ratectl code (i.e. when the wlan_amrr module wasn't loaded).
hammer2 - Add debugging
* Debug situations where the CRC fails. Print enough information
for us to poke around and compare the in-kernel buffer against
the disk buffer.
All instances to date where hammer2 has detected CRC corruption
has been due to in-memory corruption, source as yet undetermined.
* Usually hammer2 refuses to operate on the file/directory in
question until the buffer cache buffer with the in-memory error
is recycled or the machine is rebooted, avoiding corruption.
But it is possible that hammer2 might miss an in-memory
corruption event in some instances (occurring after the CRC
check, for example).
kernel - Handle race in vfsync_bp() and nfs_flush_bp()
* The RB_SCAN callback code on the clean/dirty buffer trees
must bump b_refs temporarily when issuing a blocking lock
to prevent the buffer from being ripped out from under the
call, as the vnode token will be lost during the blocking
operation.
* vfsync_bp() and vfs_flush_bp() omitted the required refs.
* Code cleanup.
kernel - Fix race in brelvp() and reassignbuf()
* brelvp() can be called with just the buffer (bp) locked. The
vnode might not be locked or referenced at the time brelvp()
is called, or might be locked by some other entity at the time.
* brelvp() obtains the vnode token but this is not sufficient.
There is a race where, once the bp is removed from the vnode lists,
the related vnode might be retired out from under brelvp() if the
vp token is temporarily lost.
The token can, in fact, be temporarily lost during the syncer list
manipulation at the end of the routine. Fix with a vhold()/vdrop()
around the related code.
* In addition, set bp->b_vp to NULL before the syncer_list manipulation
instead of after, ensuring that it is NULL'd out while the vnode token
is still atomically held. It was theoretically ok before since the
bp should be locked, but the lost vnode token atomicy was concerning
[3 lines not shown]
opencrypto - remove in-kernel crypto(9) framework
The opencrypto crypto(9) API was quite complex (8k LoC), slow and not
used by any other kernel subsystem within DragonFly anymore. It allowed
for chaining various operations, crypto ops and compression, but AFAIK
this was never really used and rather complicated. For a much simpler,
synchronous API see sys/src/crypto/cryptoapi. For a nice writeup on the
problems of crypto(9) in the context of FreeBSD, please see [1].
The opencrypto API was asynchronous by design. This was good back in the
days when dedicated hardware crypto devices did exist to help offload
the CPU. But the world has now changed towards synchronous CPU
instructions like AESNI. These dedicated CPU instructions are best
called synchronously, which removes the need for book-keeping of
asynchronous requests, often greatly simplifying the caller as well as
the API and "backend" implementation.
Furthermore, those dedicated crypto CPU instructions are not limited to
the kernel, they can be directly used by userland applications as well.
[15 lines not shown]
cryptodev: remove /dev/crypto pseudo-device
Remove the /dev/crypto pseudo-device. OpenBSD, which invented this API,
did this step back in release 5.7 (2015).
Note that this commit only removes the userland-facing /dev/crypto
device while still keeping the in-kernel crypto(9) API as-is. The plan
is to finally remove crypto(9) in a future commit.
The only applications within DragonFly that ever made use of /dev/crypto
were tcplay(8), cryptsetup(8) and cryptdisks(8) via libtcplay. But this
dependency on /dev/crypto was dropped in commit
ede102cd94449fe52fa9da25631d9f15af6d62ef as of April 21, 2025 in favor
of doing the crypto operations directly in userland without any help
from the kernel via /dev/crypto.
Userland libraries or applications like OpenSSH and OpenSSL do not use
/dev/crypto, mainly for performance reasons (and portability).