Missing tests in make pkg
```
Warning: TestGroup '/var/tmp/tests/functional/ctime' not added to this
run. Auxiliary script '/var/tmp/tests/functional/ctime/setup' failed
verification.
```
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Chunwei Chen <david.chen at nutanix.com>
Closes #17491
spa: ZIO_TASKQ_ISSUE: Use symbolic priority
This allows to change the meaning of priority differences in FreeBSD
without requiring code changes in ZFS.
This upstreams commit fd141584cf89d7d2 from FreeBSD src.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Olivier Certner <olce at FreeBSD.org>
Closes #17489
Fix bug caused by rounding in vdev_raidz_asize_to_psize
When an allocation is happening on a raidz vdev, the number of sectors
allocated is rounded up to a multiple of nparity + 1. If this results in
the allocation spilling into an extra row, then the corresponding call
to vdev_raidz_asize_to_psize will incorrectly assume that parity sectors
were allocated for that spilled row, even though no data is stored
there.
If we determine that happened, we need to subtract out those extra
sectors before performing the rest of the capacity calculation.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17490
vdev_raidz_asize_to_psize: return psize, not asize
Since 246e588, gang blocks written to raidz vdevs will write past the
end of their allocation, corrupting themselves, other data, or both.
The reason is simple - when allocating the gang children, we call
vdev_psize_to_asize() to find out how much data we should load into the
allocation we just did. vdev_raidz_asize_to_psize() had a bug; it
computed the psize, but returned the original asize. The raidz layer
dutifully writes that much out, into space beyond the end of the
allocation.
If there's existing data there, it gets overwritten, causing checksum
errors when that data is read. Even there's not data there (unlikely,
given that gang blocks are in play at all), that area is not considered
allocated, so can be allocated and overwritten later.
The fix is simple: return the psize we just computed.
[6 lines not shown]
FreeBSD: Ensure that z_pflags is initialized for new znodes
The field is subsequently accessed in zfs_mknode(), in
zfs_inherit_projid(). The Linux implementation of zfs_create_fs() has
this initialization already; there is no counterpart to
zfs_create_share_dir() that I can see.
Reported-by: KMSAN
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Mark Johnston <markj at FreeBSD.org>
Closes #17486
Ensure that gang_copies is always at least as large as copies
As discussed in the comments of PR #17004, you can theoretically run
into a case where a gang child has more copies than the gang header,
which can lead to some odd accounting behavior (and even trip a
VERIFY). While the accounting code could be changed to handle this, it
fundamentally doesn't seem to make a lot of sense to allow this to
happen. If the data is supposed to have a certain level of reliability,
that isn't actually achieved unless the gang_copies property is set to
match it.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17484
Linux 6.16: remove writepage and readahead_page
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17443
Clarify and restrict dmu_tx_assign() errors
There are three possible cases where dmu_tx_assign() may
encounter a fatal error. When there is a true lack of free
space (ENOSPC), when there is a lack of quota space (EDQUOT),
or when data required to perform the transaction cannot be
read from disk (EIO). See the dmu_tx_check_ioerr() function
for additional details of on the motivation for check for
I/O error early.
Prior to this change dmu_tx_assign() would return the
contents of tx->tx_err which covered a wide range of possible
error codes (EIO, ECKSUM, ESRCH, etc). In practice, none
of the callers could do anything useful with this level of
detail and simply returned the error.
Therefore, this change converts all tx->tx_err errors to EIO,
adds ASSERTs to dmu_tx_assign() to cover the only possible
errors, and clarifies the function comment to include EIO as
[5 lines not shown]
Fix TestGroup warning due to missing tags
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17473
FreeBSD: Wire projects support
While FreeBSD itself does not support projects, there is no reason
why it can't be controlled via `zfs project` and other subcommands.
Most of the code is actually already there and just needs some
revival and sync with Linux, plus enabling some tests not depending
on the OS support.
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #17423
Fix other nonrot bugs
There are still a variety of bugs involving the vdev_nonrot property
that will cause problems if you try to run the test suite with
segment-based weighting disabled, and with other things in the weighting
code. Parents' nonrot property need to be updated when children are
added. When vdevs are expanded and more metaslabs are added, the weights
have to be recalculated (since the number of metaslabs is an input to
the lba bias function). When opening, faulted or unopenable children
should not be considered for whether a vdev is nonrot or not (since the
nonrot property is determined during a successful open, this can cause
false negatives). And draid spares need to have the nonrot property set
correctly.
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
[2 lines not shown]
ZTS: Use FreeBSD cloudinit images
FreeBSD provides CI-IMAGES since some time. These images are
based on nuageinit, which does not support fqdn and sudo for
example. So we need currently some workarounds to get it
working.
The FreeBSD images will be more compatible with cloud-init in
some near future. Then we can remove the workaround things.
These versions are used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-3s (STABLE)
- freebsd15-0c (CURRENT)
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs at mcmilk.de>
Closes #17462
Linux 6.15 compat: META
Update the META file to reflect compatibility with the 6.15
kernel.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #17393
Relax zfs_vnops_read_chunk_size limitations
It makes no sense to limit read size below the block size, since
DMU will any way consume resources for the whole block, while the
current zfs_vnops_read_chunk_size is only 1MB, which is smaller
that maximum block size of 16MB. Plus in case of misaligned
Uncached I/O the buffer may get evicted between the chunks,
requiring repeating I/Os.
On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB.
It allows to less depend on speculative prefetcher if application
requests specific size, first not waiting for prefetcher to start
and later not prefetching more than needed.
Also while there, we don't need to align reads to the chunk size,
but only to a block size, which is smaller and so more forgiving.
My profiles show ~4% of CPU time saving when reading 16MB blocks.
[5 lines not shown]
ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.
Also removing related test things that will never match again.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17308
dmu_traverse: remove 'ignore_hole_birth' tunable alias
It's been many years, we can probably do without.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17376
Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error. To handle it better on
production systems add extra cleanup steps.
While here, replace couple dmu_tx_abort() in simple cases.
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #17438
zpool: clarify ZPOOL_STATUS_REMOVED_DEV status message
Disks can be removed either by the administrator via hotplug or by the
kernel when a disk failure occurs. The previous message implied that
removal was always manual, which could be confusing.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #17400
tunables: remove __check_old_set_param workaround
This was fully removed from Linux in 4.15, so we won't be seeing it
again.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: remove support for s64 tunables
Nothing uses them now.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: remove FreeBSD compat macros for Linux module params
Nothing in any FreeBSD code uses them.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
Reduce zfs_dmu_offset_next_sync penalty
Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE. But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.
This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #17434
CI: Retire Fedora 40 builder
Fedora 40 has gone EOL as of May 2025, retire the CI builder.
Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #17408
vdev: skip faulting disks pending removal
This patch fixes a race where vdev_remove_wanted may be set after probe
initiation, which could otherwise trigger redundant fault and removal.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #17400
CI: Retire Ubuntu 20.04 builder
Ubuntu 20.04 has gone EOL as of April 2025, retire the CI builder.
Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #17403
tunables: use Linux ullong param ops for u64
Since 3.17 Linux has provided param ops for 64-bit ints, so we don't
need to use our own anymore.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: ensure tunable and variable have same define gate
If a variable is only available in the kernel, then the tunable should
also only be available there.
This matters very little so long as we don't have userspace tunables,
but its still good hygeine.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
zfs_log_write: only put the callback on the last itx
If a write is split across mutliple itxs, we only want the callback on
the last one, otherwise it will be called for every itx associated with
this single write, which makes it very hard to know what to clean up.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj at FreeBSD.org>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
Closes #17445