OpenZFS/src 645b833include/sys spa_impl.h zio.h, man/man4 zfs.4

Improve write issue taskqs utilization

- Reduce number of allocators on small system down to one per 4
CPU cores, keeping maximum at 4 on 16+ core systems. Small systems
should not have the lock contention multiple allocators supposed
to solve, while having several metaslabs open and modified each
TXG is not free.
 - Reduce number of write issue taskqs down to one per 16 CPU
cores and an integer fraction of number of allocators.  On mid-
sized systems, where multiple allocators already make sense, too
many write issue taskqs may reduce write speed on single-file
workloads, since single file is handled by only one taskq to
reduce fragmentation. On large systems, that can actually benefit
from many taskq's better IOPS, the bottleneck is less important,
since in worst case there will be at least 16 cores to handle it.
 - Distribute dnodes between allocators (and taskqs) in a round-
robin fashion instead of relying on sync taskqs to be balanced.
The last is not guarantied and may depend on scheduling.
 - Remove io_wr_iss_tq from struct zio.  io_allocator is enough.

    [4 lines not shown]
DeltaFile
+52-29module/zfs/spa.c
+15-10man/man4/zfs.4
+19-3module/zfs/spa_misc.c
+8-1include/sys/spa_impl.h
+0-3include/sys/zio.h
+2-0include/sys/spa.h
+96-462 files not shown
+98-478 files

OpenZFS/src 8fd3a5dmodule/zfs dmu_objset.c

Slightly improve dnode hash

As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16131 
DeltaFile
+3-3module/zfs/dmu_objset.c
+3-31 files

OpenZFS/src 051460bconfig user-libunwind.m4 user.m4, lib/libspl assert.c Makefile.am

libspl/assert: use libunwind for backtrace when available

libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+44-0config/user-libunwind.m4
+32-1lib/libspl/assert.c
+2-2lib/libspl/Makefile.am
+1-0config/user.m4
+79-34 files

OpenZFS/src 2152c40config user-backtrace.m4 user.m4, lib/libspl assert.c Makefile.am

libspl/assert: dump backtrace in assert

Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+20-0lib/libspl/assert.c
+14-0config/user-backtrace.m4
+2-0lib/libspl/Makefile.am
+1-0config/user.m4
+37-04 files

OpenZFS/src dec697alib/libspl assert.c

libspl/assert: add lock around assertion output

If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+6-0lib/libspl/assert.c
+6-01 files

OpenZFS/src 3948002config user.m4, lib/libspl assert.c

libspl/assert: show process/task details in assert output

Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+34-2lib/libspl/assert.c
+1-1config/user.m4
+35-32 files

OpenZFS/src 4429ad9include/sys zfs_context.h, lib/libzpool kernel.c taskq.c

libzpool: set thread names

Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+4-4include/sys/zfs_context.h
+4-1lib/libzpool/kernel.c
+2-2lib/libzpool/taskq.c
+10-73 files

OpenZFS/src 7ac00d3config find_system_library.m4

find_system_library: fix var cleanup when library not found

The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no

    [11 lines not shown]
DeltaFile
+2-2config/find_system_library.m4
+2-21 files

OpenZFS/src 2566592. META

Tag zfs-2.2.4

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
DeltaFile
+1-1META
+1-11 files

OpenZFS/src 3d4d619module/os/freebsd/zfs zvol_os.c, module/os/linux/zfs zvol_os.c

Fix updating the zvol_htable when renaming a zvol

When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by:   Axcient
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk at axcient.com>
Signed-off-by:  Alan Somers <asomers at FreeBSD.org>
Closes #16127
Closes #16128 
DeltaFile
+1-1module/os/freebsd/zfs/zvol_os.c
+1-1module/os/linux/zfs/zvol_os.c
+2-22 files

OpenZFS/src 7063074include/sys uberblock_impl.h, module/zfs spa.c vdev.c

vdev probe to slow disk can stall mmp write checker

Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Don Brady <don.brady at klarasystems.com>
Closes #15839 
DeltaFile
+84-18module/zfs/spa.c
+97-0tests/zfs-tests/tests/functional/mmp/mmp_write_slow_disk.ksh
+13-9module/zfs/vdev.c
+8-8include/sys/uberblock_impl.h
+6-3module/zfs/zfs_ioctl.c
+9-0module/zfs/txg.c
+217-3810 files not shown
+242-5216 files

OpenZFS/src 61f3638include/sys/fs zfs.h, lib/libzfs libzfs.abi

Add prefetch property 

ZFS prefetch is currently governed by the zfs_prefetch_disable
tunable. However, this is a module-wide settings - if a specific
dataset benefits from prefetch, while others have issue with it,
an optimal solution does not exists.

This commit introduce the "prefetch" tri-state property, which enable
granular control (at dataset/volume level) for prefetching.

This patch does not remove the zfs_prefetch_disable, which remains
a system-wide switch for enable/disable prefetch. However, to avoid
duplication, it would be preferable to deprecate and then remove
the module tunable.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Gionatan Danti <g.danti at assyoma.it>

    [2 lines not shown]
DeltaFile
+19-0module/zfs/dmu_objset.c
+17-0man/man7/zfsprops.7
+11-0module/zcommon/zfs_prop.c
+7-0include/sys/fs/zfs.h
+6-1module/zfs/dmu_zfetch.c
+2-1lib/libzfs/libzfs.abi
+62-21 files not shown
+63-27 files

OpenZFS/src ea3f7c1include/sys spa.h, module/zfs spa_misc.c spa.c

Extend import_progress kstat with a notes field

Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Don Brady <don.brady at klarasystems.com>
Co-authored-by: Allan Jude <allan at klarasystems.com>
Closes #15539 
DeltaFile
+132-0tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_status.ksh
+70-4module/zfs/spa_misc.c
+39-2module/zfs/spa.c
+9-3module/zfs/spa_log_spacemap.c
+4-0include/sys/spa.h
+2-1tests/runfiles/common.run
+256-106 files

OpenZFS/src a6edc0amodule/zfs zio.c

zio: try to execute TYPE_NULL ZIOs on the current task

Many TYPE_NULL ZIOs are used to provide a sync point for child ZIOs, and
do not do any actual work themselves. However, they are still dispatched
to a dedicated, single-thread taskq, which leads to their execution
being entirely task switch and dequeue overhead for no actual reason.

This commit changes it so that when selecting a parent ZIO to execute,
if the parent is TYPE_NULL and has no done function (that is, no
additional work), it is executed on the same thread. This reduces task
switches and frees up CPU cores for other work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
Closes #16134 
DeltaFile
+6-4module/zfs/zio.c
+6-41 files

OpenZFS/src c3f2f1ainclude/sys uberblock_impl.h, module/zfs spa.c vdev.c

vdev probe to slow disk can stall mmp write checker

Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Don Brady <don.brady at klarasystems.com>
Closes #15839 
DeltaFile
+84-18module/zfs/spa.c
+97-0tests/zfs-tests/tests/functional/mmp/mmp_write_slow_disk.ksh
+13-9module/zfs/vdev.c
+8-8include/sys/uberblock_impl.h
+6-3module/zfs/zfs_ioctl.c
+9-0module/zfs/txg.c
+217-3810 files not shown
+242-5216 files

OpenZFS/src 531572bmodule/zfs dbuf.c

Fix panics when truncating/deleting files

There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate
to B_TRUE if the dirty record is of another type than dl. Adding
more explicit dr type check before trying to access dr_brtwrite.

Fixes two similar panics:

[ 1373.806119] VERIFY0(db->db_level) failed (0 == 1)
[ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1373.814979]  dump_stack_lvl+0x71/0x90
[ 1373.815799]  spl_panic+0xd3/0x100 [spl]
[ 1373.827709]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1373.829204]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1373.831010]  dnode_free_range+0x532/0x1220 [zfs]
[ 1373.833922]  dmu_free_long_range+0x4e0/0x930 [zfs]
[ 1373.835277]  zfs_trunc+0x75/0x1e0 [zfs]
[ 1373.837958]  zfs_freesp+0x9b/0x470 [zfs]
[ 1373.847236]  zfs_setattr+0x161a/0x3500 [zfs]

    [29 lines not shown]
DeltaFile
+8-10module/zfs/dbuf.c
+8-101 files

OpenZFS/src 51d3c23cmd/zpool zpool_main.c

Add newline to two zpool messages

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi at google.com>
Closes #16113 
DeltaFile
+2-2cmd/zpool/zpool_main.c
+2-21 files

OpenZFS/src db65272include libzfs.h, include/sys/fs zfs.h

[2.2.4-only] Stub RAIDZ enums to prevent conflicts

Stub in the RAIDZ expansions enums for now so that the slow IO
commit merges cleanly.

Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
DeltaFile
+2-1lib/libzfs/libzfs.abi
+2-0include/sys/fs/zfs.h
+1-0include/libzfs.h
+5-13 files

OpenZFS/src d088fb7tests/zfs-tests/tests/functional/cp_files cp_files_002_pos.ksh

ZTS: fix flakiness in cp_files_002_pos

Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Signed-off-by: Robert Evans <evansr at google.com>
Closes #16029 
DeltaFile
+3-3tests/zfs-tests/tests/functional/cp_files/cp_files_002_pos.ksh
+3-31 files

OpenZFS/src ef3fea6cmd/zpool/os/linux zpool_vdev_os.c, lib/libuutil uu_list.c

GCC: Fixes for gcc 14 on Fedora 40

- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #16124
Closes #16125 
DeltaFile
+10-4lib/libuutil/uu_list.c
+3-2module/zfs/vdev_raidz.c
+1-1cmd/zpool/os/linux/zpool_vdev_os.c
+14-73 files

OpenZFS/src 97889c0lib/libzfs libzfs_sendrecv.c

return NULL at end of send_progress_thread

Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Jason Lee <jasonlee at lanl.gov>
Closes #16074 
DeltaFile
+1-0lib/libzfs/libzfs_sendrecv.c
+1-01 files

OpenZFS/src 74101f7include/sys vdev_impl.h, man/man7 vdevprops.7

vdev props comment and manpage should include zfsd and FreeBSD mentions

Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Signed-off-by: Alek Pinchuk <apinchuk at axcient.com>
Closes #15968 
DeltaFile
+7-1man/man7/vdevprops.7
+1-1include/sys/vdev_impl.h
+8-22 files

OpenZFS/src c1c26a7cmd/zed/agents zfs_diagnosis.c fmd_api.c, module/zfs vdev.c zfs_fm.c

Add slow disk diagnosis to ZED

Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Co-authored-by: Rob Wing <rob.wing at klarasystems.com>
Signed-off-by: Don Brady <don.brady at klarasystems.com>
Closes #15469 
DeltaFile
+205-0tests/zfs-tests/tests/functional/events/zed_slow_io.ksh
+177-0tests/zfs-tests/tests/functional/events/zed_slow_io_many_vdevs.ksh
+117-26cmd/zed/agents/zfs_diagnosis.c
+26-31cmd/zed/agents/fmd_api.c
+30-0module/zfs/vdev.c
+26-0module/zfs/zfs_fm.c
+581-5723 files not shown
+654-7029 files

OpenZFS/src 9f1d3dblib/libzfs/os/linux libzfs_pool_os.c

Check for minimum partition size

On Linux block devices used for vdevs will by partitioned.  The block
device must be large enough for an 64M partition starting at offset
of 2048 sectors (part1), and a second 64M reserved partition at the
end of the device (part9).

This commit adds a capacity check when creating the GPT label to
immediately detect a device which is too small.  With the existing
code this would be caught slightly latter when attempting to use
the partition.  Catching it sooner let's us print a more useful error.

Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #15898 
DeltaFile
+10-0lib/libzfs/os/linux/libzfs_pool_os.c
+10-01 files

OpenZFS/src 6f32335cmd/zpool zpool_main.c, lib/libzfs libzfs.abi

Add ashift validation when adding devices to a pool

Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

    [6 lines not shown]
DeltaFile
+59-17cmd/zpool/zpool_main.c
+62-14lib/libzfs/libzfs.abi
+16-2man/man8/zpool-add.8
+14-3tests/zfs-tests/tests/functional/cli_root/zpool_add/add-o_ashift.ksh
+13-3tests/zfs-tests/tests/functional/cli_root/zpool_add/add_prop_ashift.ksh
+12-2module/zfs/spa.c
+176-4115 files not shown
+219-5821 files

OpenZFS/src 7aaf6cemodule/icp/asm-aarch64/sha2 sha256-armv8.S sha512-armv8.S

Add the BTI elf note to the AArch64 SHA2 assembly

On ELF platforms there is a note to specify when an application or
library supports BTI. When linking one of these the linker needs
all input object files to have the note. If not it will not include
it in the output file.

Normally the compiler would generate it, but for assembly files we
need to do it our selves.

Add the note to the aarch64 sha256 and sha512 assembly files.

Tested by building with BTI enabled and using the -zbti-report=error
flag to lld that makes it an error if the note is missing.

Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Andrew Turner <andrew.turner4 at arm.com>
Closes #16086 
DeltaFile
+10-0module/icp/asm-aarch64/sha2/sha256-armv8.S
+10-0module/icp/asm-aarch64/sha2/sha512-armv8.S
+20-02 files

OpenZFS/src 3f817de. AUTHORS .mailmap

AUTHORS: refresh with recent new contributors

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #16079 
DeltaFile
+35-0AUTHORS
+18-0.mailmap
+53-02 files

OpenZFS/src 5dda8c0include/os/freebsd/spl/sys debug.h, include/os/linux/spl/sys debug.h

Add VERIFY0P() and ASSERT0P() macros.

These macros are similar to VERIFY0() and ASSERT0() but are intended
for pointers, and therefore use uintptr_t instead of int64_t.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Kay Pedersen <mail at mkwg.de>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des at FreeBSD.org>
Closes #15225
DeltaFile
+13-0include/os/freebsd/spl/sys/debug.h
+13-0include/os/linux/spl/sys/debug.h
+11-0lib/libspl/include/assert.h
+37-03 files

OpenZFS/src b3b37b8cmd arcstat.in

Fix arcstats for FreeBSD after zfetch support

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #16141 
DeltaFile
+8-2cmd/arcstat.in
+8-21 files

OpenZFS/src 4d17e20cmd arcstat.in

Add zfetch stats in arcstats

arc_summary also reports zfetch stats but it's inconvenient to monitor
contiguously incrementing numbers. Adding them in arcstats allows us to
observe streams more conveniently.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #16094 
DeltaFile
+42-5cmd/arcstat.in
+42-51 files