OpenZFS/src 9a81484module/zfs zap_leaf.c

ZAP: Reduce leaf array and free chunks fragmentation

Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order.  Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order.  Together this created a mess in the free list, making
following allocations much more fragmented than necessary.

This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().

With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively.  Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.


    [3 lines not shown]
DeltaFile
+63-45module/zfs/zap_leaf.c
+63-451 files

OpenZFS/src d02257crpm/generic zfs-dkms.spec.in

fix: block incompatible kernel from being installed

The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: tleydxdy <shironeko.github at tesaguri.club>
Closes #16139 
DeltaFile
+1-0rpm/generic/zfs-dkms.spec.in
+1-01 files

OpenZFS/src d76d79fmodule/zfs zio.c

zio: Avoid sleeping in the I/O path

zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by:  Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Mark Johnston <markj at FreeBSD.org>
Closes #16785 
DeltaFile
+8-19module/zfs/zio.c
+8-191 files

OpenZFS/src 457f8b7include/sys brt_impl.h, module/zfs brt.c

BRT: More optimizations after per-vdev splitting

- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing

    [18 lines not shown]
DeltaFile
+242-314module/zfs/brt.c
+7-10include/sys/brt_impl.h
+249-3242 files

OpenZFS/src 49a377atests/zfs-tests/tests/functional/cli_root/zpool_status zpool_status_008_pos.ksh

ZTS: Fix zpool_status_008_pos false positive

Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #16769 
DeltaFile
+5-5tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_008_pos.ksh
+5-51 files

OpenZFS/src 0ca82c5include/sys arc.h, module/zfs arc.c spa.c

L2ARC: Stop rebuild before setting spa_final_txg

Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Amanakis <gamanakis at gmail.com>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16714
Closes #16782 
DeltaFile
+32-3module/zfs/arc.c
+2-0module/zfs/spa.c
+1-0include/sys/arc.h
+35-33 files

OpenZFS/src 5346889cmd arc_summary, include/sys arc_impl.h

Remove hash_elements_max accounting from DBUF and ARC

Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it.  If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.

For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16759 
DeltaFile
+7-6module/zfs/arc.c
+7-5module/zfs/dbuf.c
+1-4cmd/arc_summary
+1-0include/sys/arc_impl.h
+16-154 files

OpenZFS/src ffe2112module/zfs zio_checksum.c zio_compress.c

Move "no name changes" from compression to checksum table

Compression names actually aren't used in dedup table names, but
checksum names are.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
Closes #16776 
DeltaFile
+6-0module/zfs/zio_checksum.c
+0-4module/zfs/zio_compress.c
+6-42 files

OpenZFS/src e08e832man/man8 zpool-remove.8 zpool.8

Expand zpool-remove.8 manpage with example results

Also fix comment cross-referencing to zpool.8.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by:  Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Steve Mokris <smokris at softpixel.com>
Closes #16777 
DeltaFile
+35-1man/man8/zpool-remove.8
+34-0man/man8/zpool.8
+69-12 files

OpenZFS/src 0d6306binclude/os/freebsd/spl/sys debug.h, include/os/linux/spl/sys debug.h

Fix few __VA_ARGS typos in assertions

It should be __VA_ARGS__, not __VA_ARGS.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16780 
DeltaFile
+2-2include/os/linux/spl/sys/debug.h
+2-2include/os/freebsd/spl/sys/debug.h
+4-42 files

OpenZFS/src ff3df12cmd/zed/agents zfs_retire.c

zed: prevent automatic replacement of offline vdevs

When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #16751 
DeltaFile
+2-2cmd/zed/agents/zfs_retire.c
+2-21 files

OpenZFS/src fd6e8c1cmd/zdb zdb.c, include/sys brt_impl.h spa_impl.h

BRT: Rework structures and locks to be per-vdev

While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks.  It created lots of lock contention in many workload.

This change makes most of block cloning data structures per-vdev,
which allows to lock them separately.  The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader.  Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process.  There should be
no functional difference aside of some optimizations.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd at FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>

    [3 lines not shown]
DeltaFile
+332-451module/zfs/brt.c
+44-49include/sys/brt_impl.h
+13-21cmd/zdb/zdb.c
+6-11module/zfs/spa_misc.c
+5-1include/sys/spa_impl.h
+1-0include/sys/spa.h
+401-5336 files

OpenZFS/src 309ce63include/sys zap.h, module/zfs zap_micro.c

ZAP: Add by_dnode variants to lookup/prefetch_uint64

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd at FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
DeltaFile
+58-10module/zfs/zap_micro.c
+4-3include/sys/zap.h
+62-132 files

OpenZFS/src 1ee251bmodule/zfs dbuf.c

BRT: Don't call brt_pending_remove() on holes/embedded

We are doing exactly the same checks around all brt_pending_add().

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd at FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
DeltaFile
+5-2module/zfs/dbuf.c
+5-21 files

OpenZFS/src 483087btests/zfs-tests/tests/functional/bclone bclone_prop_sync.ksh

ZTS: Avoid embedded blocks in bclone/bclone_prop_sync

If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd at FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
DeltaFile
+6-2tests/zfs-tests/tests/functional/bclone/bclone_prop_sync.ksh
+6-21 files

OpenZFS/src 648873f. AUTHORS .mailmap

AUTHORS: refresh with recent new contributors

Welcome to the party 🎉

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #16762 
DeltaFile
+4-0AUTHORS
+2-0.mailmap
+6-02 files

OpenZFS/src de2e9a5tests/zfs-tests/cmd getversion.c

tests: fix uClibc for getversion.c

This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d4e5454db62eb48b58ecd4970838a76 to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis at gmail.com>
Closes #16735
Closes #16741 
DeltaFile
+5-1tests/zfs-tests/cmd/getversion.c
+5-11 files

OpenZFS/src 3462f3bmodule/os/linux/zfs zvol_os.c

zvol_os.c: Increase optimal IO size

Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #16750 
DeltaFile
+1-1module/os/linux/zfs/zvol_os.c
+1-11 files

OpenZFS/src 8dc452dmodule/os/freebsd/zfs zfs_vnops_os.c

Fix some nits in zfs_getpages()

- If we don't want dmu_read_pages() to perform extra readahead/behind,
  pass a pointer to 0 instead of a null pointer, as dum_read_pages()
  expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.

Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Mark Johnston <markj at FreeBSD.org>
Closes #16758 
DeltaFile
+5-2module/os/freebsd/zfs/zfs_vnops_os.c
+5-21 files

OpenZFS/src 46c4f2cmodule/zfs dsl_dataset.c

dsl_dataset: put IO-inducing frees on the pool deadlist

dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.


    [19 lines not shown]
DeltaFile
+26-2module/zfs/dsl_dataset.c
+26-21 files

OpenZFS/src a60ed38module/zfs arc.c

L2ARC: Move different stats updates earlier

..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16040
Closes #16743 
DeltaFile
+10-8module/zfs/arc.c
+10-81 files

OpenZFS/src 1786825module/os/freebsd/zfs zfs_vnops_os.c

Grab the rangelock unconditionally in zfs_getpages()

As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Signed-off-by: Mark Johnston <markj at FreeBSD.org>
Closes #16643
DeltaFile
+51-17module/os/freebsd/zfs/zfs_vnops_os.c
+51-171 files

OpenZFS/src 25eb538module/os/freebsd/zfs zfs_vnops_os.c

Fix a potential page leak in mappedread_sf()

mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Signed-off-by: Mark Johnston <markj at FreeBSD.org>
Closes #16643
DeltaFile
+3-1module/os/freebsd/zfs/zfs_vnops_os.c
+3-11 files

OpenZFS/src 1c9a4c8lib/libzfs libzfs_pool.c

Fix user properties output for zpool list

In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.

Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem at ixsystems.com>
Closes #16734
DeltaFile
+5-3lib/libzfs/libzfs_pool.c
+5-31 files

OpenZFS/src 3a0a142cmd/zpool zpool_main.c

JSON: fix user properties output for zpool list

This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem at ixsystems.com>
Closes #16734
DeltaFile
+6-1cmd/zpool/zpool_main.c
+6-11 files

OpenZFS/src e12d761module/os/linux/zfs vdev_file.c, tests/zfs-tests/cmd getversion.c

Use <fcntl.h> instead of <sys/fcntl.h>

When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Sam James <sam at gentoo.org>
Closes #15925 
DeltaFile
+3-1module/os/linux/zfs/vdev_file.c
+1-1tests/zfs-tests/cmd/getversion.c
+4-22 files

OpenZFS/src 8131793module/os/linux/zfs abd_os.c

Update ABD stats for linear page Linux

a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee at delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson at lanl.gov>
Closes #16729 
DeltaFile
+2-0module/os/linux/zfs/abd_os.c
+2-01 files

OpenZFS/src 1a54b13. META

Tag 2.3.0-rc3

Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
DeltaFile
+1-1META
+1-11 files

OpenZFS/src 9061a4dcmd/zfs zfs_main.c

JSON: fix user properties output for zfs list

This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem at ixsystems.com>
Closes #16732 
DeltaFile
+6-1cmd/zfs/zfs_main.c
+6-11 files

OpenZFS/src 57fc597cmd/zfs zfs_main.c

JSON: fix user properties output for zfs list

This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem at ixsystems.com>
Closes #16732 
DeltaFile
+6-1cmd/zfs/zfs_main.c
+6-11 files