OpenZFS/src 7be9fa2module/zfs vdev.c

Fix nonrot property being incorrectly unset (#17206)

When opening a vdev and setting the nonrot property, we used to wait for
each child to be opened before examining its nonrot property. When the
change was made to open vdevs asynchronously, we didn't move the nonrot
check out of the main loop. As a result, the nonrot property is almost
always set to false, regardless of the actual type of the underlying
disks. The fix is simply to move the nonrot check to a separate loop
after the taskq has been waited for.

Sponsored-by: Klara, Inc.
Sponsored-by: Eshtek, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>

Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
DeltaFile
+6-3module/zfs/vdev.c
+6-31 files

OpenZFS/src 367d34binclude/sys vdev.h, module/zfs vdev.c vdev_removal.c

Fix dspace underflow bug

Since spa_dspace accounts only normal allocation class space,
spa_nonallocating_dspace should do the same.  Otherwise we may get
negative overflow or respective assertion spa_update_dspace() if
removed special/dedup vdev is bigger than all normal class space.

Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17183
DeltaFile
+20-2module/zfs/vdev.c
+3-14module/zfs/vdev_removal.c
+1-0include/sys/vdev.h
+24-163 files

OpenZFS/src 9250403module/zfs zio.c dmu.c, tests/zfs-tests/tests/functional/gang_blocks gang_blocks.kshlib gang_blocks_redundant.ksh

Make ganging redundancy respect redundant_metadata property (#17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.


    [6 lines not shown]
DeltaFile
+120-0tests/zfs-tests/tests/functional/gang_blocks/gang_blocks.kshlib
+88-0tests/zfs-tests/tests/functional/gang_blocks/gang_blocks_redundant.ksh
+31-0tests/zfs-tests/tests/functional/gang_blocks/cleanup.ksh
+30-0tests/zfs-tests/tests/functional/gang_blocks/setup.ksh
+12-11module/zfs/zio.c
+20-1module/zfs/dmu.c
+301-129 files not shown
+327-2015 files

OpenZFS/src 1b495eeinclude/sys ddt.h, man/man4 zfs.4

FDT dedup log sync  -- remove incremental

This PR condenses the FDT dedup log syncing into a single sync
pass. This reduces the overhead of modifying indirect blocks for the
dedup table multiple times per txg. In addition, changes were made to
the formula for how much to sync per txg. We now also consider the
backlog we have to clear, to prevent it from growing too large, or
remaining large on an idle system.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Authored-by: Don Brady <don.brady at klarasystems.com>
Authored-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17038
DeltaFile
+179-163module/zfs/ddt.c
+109-0tests/zfs-tests/tests/functional/dedup/dedup_fdt_pacing.ksh
+50-31man/man4/zfs.4
+10-0module/zfs/vdev_queue.c
+2-5include/sys/ddt.h
+2-2tests/runfiles/common.run
+352-2017 files not shown
+366-20213 files

OpenZFS/src fc460bfcmd/zdb zdb.c, tests/runfiles common.run

Add more DDT tests

The new Fast Dedup feature has a lot of moving parts, and only some of
them have tests. We have some tests for prefetch and quota, and a
generic ZAP shrinking test, but we don't have anything for the pruning
command or specific to DDT zap shrinking. Here we add a couple small new
tests for zpool ddtprune and DDT-specific ZAP shrinking.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17049
DeltaFile
+96-0tests/zfs-tests/tests/functional/dedup/dedup_prune.ksh
+82-0tests/zfs-tests/tests/functional/dedup/dedup_zap_shrink.ksh
+2-0cmd/zdb/zdb.c
+1-1tests/runfiles/common.run
+2-0tests/zfs-tests/tests/Makefile.am
+183-15 files

OpenZFS/src 7e72312module/zfs vdev_removal.c, tests/runfiles common.run

Don't try to get mg of hole vdev in removal

Don't try to get mg of hole vdev in removal

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17080
DeltaFile
+34-0tests/zfs-tests/tests/functional/removal/removal_with_hole.ksh
+2-2module/zfs/vdev_removal.c
+2-1tests/runfiles/common.run
+1-0tests/zfs-tests/tests/Makefile.am
+39-34 files

OpenZFS/src a0f8d3cmodule/zfs zio.c

Add kstats tracking gang allocations

Gang blocks have a significant impact on the long and short term
performance of a zpool, but there is not a lot of observability into
whether they're being used.  This change adds gang-specific kstats to
ZFS, to better allow users to see whether ganging is happening.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17003 
DeltaFile
+73-0module/zfs/zio.c
+73-01 files

OpenZFS/src 2adca17man/man4 zfs.4, module/zfs metaslab.c

Expand fragmentation table to reflect larger possibile allocation sizes

When you are using large recordsizes in conjunction with raidz, with
incompressible data, you can pretty reliably be making 21 MB
allocations. Unfortunately, the fragmentation metric in ZFS considers
any metaslabs with 16 MB free chunks completely unfragmented, so you can
have a metaslab report 0% fragmented and be unable to satisfy an
allocation. When using the segment-based metaslab weight, this is
inconvenient; when using the space-based one, it can seriously degrade
performance.

We expand the fragmentation table to extend up to 512MB, and redefine
the table size based on the actual table, rather than having a static
define. We also tweak the one variable that depends on fragmentation
directly.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan at klarasystems.com>

    [2 lines not shown]
DeltaFile
+31-26module/zfs/metaslab.c
+1-1man/man4/zfs.4
+32-272 files

OpenZFS/src 701093cmodule/zfs vdev_removal.c, tests/runfiles common.run

Don't try to get mg of hole vdev in removal

Don't try to get mg of hole vdev in removal

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17080
DeltaFile
+34-0tests/zfs-tests/tests/functional/removal/removal_with_hole.ksh
+2-2module/zfs/vdev_removal.c
+2-1tests/runfiles/common.run
+1-0tests/zfs-tests/tests/Makefile.am
+39-34 files

OpenZFS/src 88020b9module/zfs zio.c

Add kstats tracking gang allocations

Gang blocks have a significant impact on the long and short term
performance of a zpool, but there is not a lot of observability into
whether they're being used.  This change adds gang-specific kstats to
ZFS, to better allow users to see whether ganging is happening.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #17003 
DeltaFile
+73-0module/zfs/zio.c
+73-01 files

OpenZFS/src 4049651man/man4 zfs.4, module/zfs metaslab.c

Expand fragmentation table to reflect larger possibile allocation sizes

When you are using large recordsizes in conjunction with raidz, with
incompressible data, you can pretty reliably be making 21 MB
allocations. Unfortunately, the fragmentation metric in ZFS considers
any metaslabs with 16 MB free chunks completely unfragmented, so you can
have a metaslab report 0% fragmented and be unable to satisfy an
allocation. When using the segment-based metaslab weight, this is
inconvenient; when using the space-based one, it can seriously degrade
performance.

We expand the fragmentation table to extend up to 512MB, and redefine
the table size based on the actual table, rather than having a static
define. We also tweak the one variable that depends on fragmentation
directly.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan at klarasystems.com>

    [2 lines not shown]
DeltaFile
+31-26module/zfs/metaslab.c
+1-1man/man4/zfs.4
+32-272 files

OpenZFS/src 244ea5cmodule/zfs dataset_kstats.c

Add missing kstats to dataset kstats

Reviewed-by: Tony Nguyen <tony.nguyen at delphix.com>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #16431 
DeltaFile
+3-0module/zfs/dataset_kstats.c
+3-01 files

OpenZFS/src 889152ccmd/zpool zpool_main.c

Give a better message from 'zpool get' with invalid pool name

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Don Brady <don.brady at klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen at delphix.com>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15942 
DeltaFile
+3-4cmd/zpool/zpool_main.c
+3-41 files

OpenZFS/src b6bbaa8cmd/zpool zpool_main.c

Give a better message from 'zpool get' with invalid pool name

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Don Brady <don.brady at klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen at delphix.com>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15942 
DeltaFile
+3-4cmd/zpool/zpool_main.c
+3-41 files

OpenZFS/src ab65360module/zfs metaslab.c

Don't assert mg_initialized due to device addition race

During device removal stress tests, we noticed that we were tripping 
the assertion that mg_initialized was true. After investigation, it was 
determined that the mg in question was the embedded log metaslab 
group for a newly added vdev; the normal mg had been initialized (by 
metaslab_sync_reassess, via vdev_sync_done). However, because the spa 
config alloc lock is not held as writer across both calls to 
metaslab_sync_reassess, it is possible for an allocation to happen 
between the two metaslab_groups being initialized. Because the metaslab 
code doesn't check the group in question, just the vdev's main mg, it 
is possible to get past the initial check in vdev_allocatable and 
later fail due to the assertion.

We simply remove the assertions. We could also consider locking the 
ALLOC lock around the reassess calls in vdev_sync_done, but that risks 
deadlocks. We could check the actual target mg in vdev_allocatable, 
but that risks racing with a passivation that comes in after that 
check but before the assertion. We still won't be able to actually 

    [6 lines not shown]
DeltaFile
+0-3module/zfs/metaslab.c
+0-31 files

OpenZFS/src 8161b73module/zfs metaslab.c

Don't assert mg_initialized due to device addition race

During device removal stress tests, we noticed that we were tripping 
the assertion that mg_initialized was true. After investigation, it was 
determined that the mg in question was the embedded log metaslab 
group for a newly added vdev; the normal mg had been initialized (by 
metaslab_sync_reassess, via vdev_sync_done). However, because the spa 
config alloc lock is not held as writer across both calls to 
metaslab_sync_reassess, it is possible for an allocation to happen 
between the two metaslab_groups being initialized. Because the metaslab 
code doesn't check the group in question, just the vdev's main mg, it 
is possible to get past the initial check in vdev_allocatable and 
later fail due to the assertion.

We simply remove the assertions. We could also consider locking the 
ALLOC lock around the reassess calls in vdev_sync_done, but that risks 
deadlocks. We could check the actual target mg in vdev_allocatable, 
but that risks racing with a passivation that comes in after that 
check but before the assertion. We still won't be able to actually 

    [6 lines not shown]
DeltaFile
+0-3module/zfs/metaslab.c
+0-31 files

OpenZFS/src 6c6fae6lib/libzutil/os/linux zutil_setproctitle.c

Fix memory leak in zfs_setprocinit code

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain at gmail.com>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15508 
DeltaFile
+2-25lib/libzutil/os/linux/zutil_setproctitle.c
+2-251 files

OpenZFS/src d38f466tests/zfs-tests/tests/functional/cli_root/zpool_trim zpool_trim_partial.ksh zpool_trim_verify_trimmed.ksh, tests/zfs-tests/tests/functional/trim autotrim_config.ksh autotrim_integrity.ksh

Reduce trim min size even lower for tests to reduce flakiness 

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15315 
DeltaFile
+1-1tests/zfs-tests/tests/functional/cli_root/zpool_trim/zpool_trim_partial.ksh
+1-1tests/zfs-tests/tests/functional/cli_root/zpool_trim/zpool_trim_verify_trimmed.ksh
+1-1tests/zfs-tests/tests/functional/trim/autotrim_config.ksh
+1-1tests/zfs-tests/tests/functional/trim/autotrim_integrity.ksh
+1-1tests/zfs-tests/tests/functional/trim/autotrim_trim_integrity.ksh
+1-1tests/zfs-tests/tests/functional/trim/trim_config.ksh
+6-61 files not shown
+7-77 files

OpenZFS/src 99dc1fctests/zfs-tests/tests/functional/block_cloning block_cloning_copyfilerange_fallback_same_txg.ksh

ZTS: Fix introduced test bug in block_cloning_copyfilerange

Reviewed-by: John Wren Kennedy <john.kennedy at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15316
DeltaFile
+0-1tests/zfs-tests/tests/functional/block_cloning/block_cloning_copyfilerange_fallback_same_txg.ksh
+0-11 files

OpenZFS/src 8526b12tests/zfs-tests/tests/functional/block_cloning block_cloning_copyfilerange_fallback_same_txg.ksh

Set timeout before creating pool in test

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15309 
DeltaFile
+2-2tests/zfs-tests/tests/functional/block_cloning/block_cloning_copyfilerange_fallback_same_txg.ksh
+2-21 files

OpenZFS/src 9e36c57module/zfs metaslab.c

Don't allocate from new metaslabs

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15307
Closes #15308 
DeltaFile
+9-0module/zfs/metaslab.c
+9-01 files

OpenZFS/src 0ce1b2ccmd ztest.c

Invoke zdb by guid to avoid import errors

The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens at delphix.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15298 
DeltaFile
+12-9cmd/ztest.c
+12-91 files

OpenZFS/src 5551dcdmodule/zfs metaslab.c

Don't allocate from new metaslabs

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15307
Closes #15308 
DeltaFile
+9-0module/zfs/metaslab.c
+9-01 files

OpenZFS/src ec99448tests/zfs-tests/tests/functional/cli_root/zpool_trim zpool_trim_partial.ksh zpool_trim_verify_trimmed.ksh, tests/zfs-tests/tests/functional/trim autotrim_config.ksh autotrim_integrity.ksh

Reduce trim min size even lower for tests to reduce flakiness 

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15315 
DeltaFile
+1-1tests/zfs-tests/tests/functional/cli_root/zpool_trim/zpool_trim_partial.ksh
+1-1tests/zfs-tests/tests/functional/cli_root/zpool_trim/zpool_trim_verify_trimmed.ksh
+1-1tests/zfs-tests/tests/functional/trim/autotrim_config.ksh
+1-1tests/zfs-tests/tests/functional/trim/autotrim_integrity.ksh
+1-1tests/zfs-tests/tests/functional/trim/autotrim_trim_integrity.ksh
+1-1tests/zfs-tests/tests/functional/trim/trim_config.ksh
+6-61 files not shown
+7-77 files

OpenZFS/src a59e294tests/zfs-tests/tests/functional/block_cloning block_cloning_copyfilerange_fallback_same_txg.ksh

ZTS: Fix introduced test bug in block_cloning_copyfilerange

Reviewed-by: John Wren Kennedy <john.kennedy at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15316
DeltaFile
+0-1tests/zfs-tests/tests/functional/block_cloning/block_cloning_copyfilerange_fallback_same_txg.ksh
+0-11 files

OpenZFS/src 8a9ba76tests/zfs-tests/tests/functional/block_cloning block_cloning_copyfilerange_fallback_same_txg.ksh

Set timeout before creating pool in test

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15309 
DeltaFile
+2-2tests/zfs-tests/tests/functional/block_cloning/block_cloning_copyfilerange_fallback_same_txg.ksh
+2-21 files

OpenZFS/src 729507dlib/libzfs_core libzfs_core.c

Fix occasional rsend test crashes

We have occasional crashes in the rsend tests. Debugging revealed 
that this is because the send_worker thread is getting EINTR from 
splice(). This happens when a non-fatal signal is received during 
the syscall. We should retry the syscall, rather than exiting failure.
Tweak the loop to only break if the splice is finished or we receive 
a non-EINTR error.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli at nabijaczleweli.xyz>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15273 
DeltaFile
+6-4lib/libzfs_core/libzfs_core.c
+6-41 files

OpenZFS/src 9aa1a28cmd ztest.c

Fix incorrect expected error in ztest

There is an occasional ztest failure that looks like ztest: attach 
(/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) 
returned 22, expected 95. This is because the value that we return 
is EINVAL, but expected_error is set incorrectly.

Change the expected_error value to match both the comment and the 
actual error value.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15295 
DeltaFile
+1-1cmd/ztest.c
+1-11 files

OpenZFS/src cc75c81module/zfs arc.c

Fix l2arc_apply_transforms ztest crash 

In #13375 we modified the allocation size of the buffer that we use 
to apply l2arc transforms to be the size of the arc hdr we're using, 
rather than the allocation size that will be in place on the disk, 
because sometimes the hdr size is larger. Unfortunately, sometimes 
the allocation size is larger, which means that we overflow the buffer 
in that case. This change modifies the allocation to be the max of 
the two values

Reviewed-by: Mark Maybee <mark.maybee at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15177
Closes #15248 
DeltaFile
+6-5module/zfs/arc.c
+6-51 files

OpenZFS/src 2e2a46ecmd ztest.c

Invoke zdb by guid to avoid import errors

The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens at delphix.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd at delphix.com>
Closes #15298 
DeltaFile
+12-9cmd/ztest.c
+12-91 files