vkernel: Restore MAP_VPAGETABLE support with COW/VPTE fix
Re-implement MAP_VPAGETABLE support which was removed in commit
4d4f84f5f26bf5e9fe4d0761b34a5f1a3784a16f. This enables vkernel
functionality - allowing a full kernel to run in userspace.
Changes:
- Re-enable VM_MAPTYPE_VPAGETABLE in sys/vm/vm.h
- Restore vm_fault_vpagetable() for VPTE translation
- Add PG_VPTMAPPED flag to track pages mapped via VPAGETABLE
- Handle MADV_SETMAP/MADV_INVAL for vkernel pmap management
- Fix COW/VPTE synchronization: set FW_DIDCOW for page-level COW
so pmap_enter() updates VPTEs after copy-on-write
- Use anonymous memory (unlinked file) for vkernel memory backing
to ensure proper page reclaim when vkernel exits
- Fix vkernel build issues (cpu_feature2, COWF_PREFAULT_* constants)
- Skip callout address validation for vkernels (_KERNEL_VIRTUAL)
NOTE: This code was almost fully generated by Claude Opus 4.5
virtio_blk - Implement multiqueue support.
* For now, this code uses at most as many virtqueues as cpu cores and
interrupts are available. The number of virtqueues can also be capped via
"hw.vtblk.max_queues" and "hw.vtblk.X.max_queues" tunables (also useful
for benchmarking different scenarios).
* Since virtio interrupts are currently tied together with the per-queue
serializer lock, a fixed 1:1 mapping between interrupts and virtqueues is
used for now, for simplicity. To support multiple virtqueues to share a
single interrupt properly, it's going to be far more elegant if we have a
separate serializer for each virtqueue, and not just for each interrupt.
virtio_blk - Clean up detach code, to work better for multiqueue.
* Specifically this uses the fact that vtblk_flags is only ever modified in
vtblk_attach(), vtblk_detach(), vtblk_suspend(), and vtblk_resume(). So
we don't need to worry about taking all the serializers when setting the
DETACH flag.
* Additionally the virtqueue_disable_intr() call from vtblk_stop() was
completely superfluous, since the virtio_teardown_intr() call earlier
already guarantees that none of the virtqueue interrupt handlers can be
active anymore.
kernel - Get rid of ancient unused 256 non-cloned pty devices.
* These haven't really been used in more than a decade, I think.
* The only binary that still has code for these is rfcomm_sppd(1), however
that's part of the bluetooth stack, and hence isn't useable anyways.
This should be easy to modernize, to use a cloned pty device instead.
* Updating manpages a bit, to refer to the "modern" cloned pty devices.
if_mtw - Port from FreeBSD, with bugfixes, and parts re-ported from OpenBSD.
* Reverted some recent changes in FreeBSD for 80211 API changes on their side:
* reverted ratectl API use in ieee80211_ratectl_tx_update()
* crypto API change (git 5431dafdb9659fb578f)
* seqno offload (git cce278510a820785d88)
* ni->ni_txrate references use (git 7067450010931479f8)
* Re-ported the firmware loading code from OpenBSD. Unfortunately firmware
loading still is not reliable. Sometimes it appears to be consistentl
working, and then after a reboot it repeatedly fails with the same code.
* Added some error return-codes back that OpenBSD's code has, but got removed
in the FreeBSD code. This makes it behave a bit better when it fails.
* Fixed the aggregated RX buffer handling, to split up the mbuf correctly,
using the m_copym method, based on the corresponding code in if_iwm(4).
* Remove the small pieces of #ifdef IEEE80211_SUPPORT_SUPERG code.
[5 lines not shown]
import openresolv-3.17.3 with the following changes:
resolvconf: Add a function to quote and escape input for eval
resolvconf: quote on printf rather than on value
Import openresolv-3.17.1 with the following changes:
* libc: add toggle resolv_conf_restore, defaulting to YES
* resolvconf: Fix -D
* resolvconf: Don't warn when we have no entries to list for *
* resolvconf: -I now inits subscribers after clearing state
* resolvconf: remember if any subscriber errored
* libc: Don't update resolv.conf on signature mismatch
* resolvconf: Single quote parsed values from resolv.conf
* resolvconf: -L now outputs fully processed resolv.conf files
ahci - Properly check and set SATA capabilities and features for ALPM.
- This should now only enable device-initiated and/or host-initiated link
power-management when it is supported by both controller and disk device.
- Also this now allows for device-initiated power-management to be enabled
with AHCI controllers that don't support automatic host-initiated
power-management.
- In addition, this adds support for automatic promotion of "partial" state
to "slumber" state (i.e. without needing to go through "active" state).
- The kernel console output now explicitly tells when it's enabling DIPM
(device-initiated power management) and HIPM (host-initiated power
management) respectively.
wlan - Remove NULL free in fallback "none" ratectl code.
This avoids a kernel panic when detaching a wlan interface that was created
with the "none" ratectl code (i.e. when the wlan_amrr module wasn't loaded).
hammer2 - Add debugging
* Debug situations where the CRC fails. Print enough information
for us to poke around and compare the in-kernel buffer against
the disk buffer.
All instances to date where hammer2 has detected CRC corruption
has been due to in-memory corruption, source as yet undetermined.
* Usually hammer2 refuses to operate on the file/directory in
question until the buffer cache buffer with the in-memory error
is recycled or the machine is rebooted, avoiding corruption.
But it is possible that hammer2 might miss an in-memory
corruption event in some instances (occurring after the CRC
check, for example).
kernel - Handle race in vfsync_bp() and nfs_flush_bp()
* The RB_SCAN callback code on the clean/dirty buffer trees
must bump b_refs temporarily when issuing a blocking lock
to prevent the buffer from being ripped out from under the
call, as the vnode token will be lost during the blocking
operation.
* vfsync_bp() and vfs_flush_bp() omitted the required refs.
* Code cleanup.
kernel - Fix race in brelvp() and reassignbuf()
* brelvp() can be called with just the buffer (bp) locked. The
vnode might not be locked or referenced at the time brelvp()
is called, or might be locked by some other entity at the time.
* brelvp() obtains the vnode token but this is not sufficient.
There is a race where, once the bp is removed from the vnode lists,
the related vnode might be retired out from under brelvp() if the
vp token is temporarily lost.
The token can, in fact, be temporarily lost during the syncer list
manipulation at the end of the routine. Fix with a vhold()/vdrop()
around the related code.
* In addition, set bp->b_vp to NULL before the syncer_list manipulation
instead of after, ensuring that it is NULL'd out while the vnode token
is still atomically held. It was theoretically ok before since the
bp should be locked, but the lost vnode token atomicy was concerning
[3 lines not shown]
opencrypto - remove in-kernel crypto(9) framework
The opencrypto crypto(9) API was quite complex (8k LoC), slow and not
used by any other kernel subsystem within DragonFly anymore. It allowed
for chaining various operations, crypto ops and compression, but AFAIK
this was never really used and rather complicated. For a much simpler,
synchronous API see sys/src/crypto/cryptoapi. For a nice writeup on the
problems of crypto(9) in the context of FreeBSD, please see [1].
The opencrypto API was asynchronous by design. This was good back in the
days when dedicated hardware crypto devices did exist to help offload
the CPU. But the world has now changed towards synchronous CPU
instructions like AESNI. These dedicated CPU instructions are best
called synchronously, which removes the need for book-keeping of
asynchronous requests, often greatly simplifying the caller as well as
the API and "backend" implementation.
Furthermore, those dedicated crypto CPU instructions are not limited to
the kernel, they can be directly used by userland applications as well.
[15 lines not shown]
cryptodev: remove /dev/crypto pseudo-device
Remove the /dev/crypto pseudo-device. OpenBSD, which invented this API,
did this step back in release 5.7 (2015).
Note that this commit only removes the userland-facing /dev/crypto
device while still keeping the in-kernel crypto(9) API as-is. The plan
is to finally remove crypto(9) in a future commit.
The only applications within DragonFly that ever made use of /dev/crypto
were tcplay(8), cryptsetup(8) and cryptdisks(8) via libtcplay. But this
dependency on /dev/crypto was dropped in commit
ede102cd94449fe52fa9da25631d9f15af6d62ef as of April 21, 2025 in favor
of doing the crypto operations directly in userland without any help
from the kernel via /dev/crypto.
Userland libraries or applications like OpenSSH and OpenSSL do not use
/dev/crypto, mainly for performance reasons (and portability).
kernel - Do readonly check in .d_open method in mmcsd(4) and virtio_blk(4).
* Makes read-write open fail properly for read-only storage in mmcsd(4) and
virtio_blk(4), instead of only resulting in transfer errors for the
disk writes.
syscons - Add 16bit rendering code for UEFI and KMS driver framebuffers.
This fixes syscons rendering with KMS graphics drivers, when the driver
hands us a 16bit console framebuffer.