OpenZFS/src eed67e4module/zfs zap.c

zap: split objset+object implementations to use a dnode

For the functions that don't (yet) have _by_dnode() variants, give them
the same treatment as the previous commit - pull their implementation
into a _by_dnode() function, with the original as a simple wrapper.

This lets them all follow the same uniform pattern, and lays the
groundwork for further cleanup in other non-dnode parts of the ZAP
subsystem.

Note that it would be trivial to expose these new _by_dnode() functions,
but there's no need to do that until there's an external need for them.

Also note that there's no change yet to the following, which are not
simple zap_t operations in the same way:

 - zap_contains: wrapper around other ops
 - zap_increment: wrapper around other opts
 - zap_*_int(): wrappers around other ops

    [8 lines not shown]
DeltaFile
+72-23module/zfs/zap.c
+72-231 files

OpenZFS/src bd02c10module/zfs zap.c

zap: make the _by_dnode() op variants be the primary implementation

The existing pattern for each operation is to have a "frontend" function
that takes an object referenced by either a objset+object pair (eg
zap_add()) or an existing dnode (eg zap_add_by_dnode()). Those functions
obtain a locked zap_t for the given object from either zap_lockdir() or
zap_lockdir_by_dnode(). That zap_t, the operation args, and the refcount
tag for lockdir() are then passed through to through to the "backend"
function (eg zap_add()), which does the work and then releases calls
zap_unlockdir() to release the zap_t.

This pattern is overcomplicated, in at least three ways:

- Both frontends for each operation have to make the call to
  zap_lockdir(), which has multiple args that must be the same for both.

- Frontends need to pass the refcount tag to the backend so it can
  call zap_unlockdir() correctly, which makes the signature more
  complicated.

    [28 lines not shown]
DeltaFile
+156-273module/zfs/zap.c
+156-2731 files

OpenZFS/src 891e379man/man7 vdevprops.7, module/os/linux/zfs vdev_disk.c

Fix failfast default and usage

The feature that added a failfast property to vdevs unfortunately did
not correctly set the default at creation time, so many vdevs do not
actually have the property set. In addition, when the property is
used, the failfast flag is not checked correctly, resulting in the
feature mostly not working as intended.

Set the failfast property to the default value at vdev allocation time.
The value will be read in from the ZAP as normal when the vdev metadata
is loaded.  Allow the property to be set on any vdev and have it be
inherited from the root or top-level vdev.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Closes #18410
DeltaFile
+115-0tests/zfs-tests/tests/functional/cli_root/zpool_set/zpool_set_inherit.ksh
+21-9module/zfs/vdev.c
+9-3module/zcommon/zpool_prop.c
+7-1module/os/linux/zfs/vdev_disk.c
+4-1man/man7/vdevprops.7
+2-2tests/runfiles/common.run
+158-163 files not shown
+161-179 files

OpenZFS/src 40a8765include/sys zap_impl.h

zap_impl: use flex array field for mzap_phys_t.mz_chunks

mz_phys_t is always a full-block allocation, with mz_chunks[] as an
array over the rest of the block past the header.

Recent Linux compiled with CONFIG_UBSAN will complain about this:

    UBSAN: array-index-out-of-bounds in module/zfs/zap.c:1236:28
    index 2 is out of range for type 'mzap_ent_phys_t [1]'

The fix is straightforward; simply convert this field to a flex member.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18550
DeltaFile
+2-1include/sys/zap_impl.h
+2-11 files

OpenZFS/src 6fb72fdmodule/zfs zio.c

zio_ddt_write: compute have_dvas after taking dde_io_lock

In zio_ddt_write(), have_dvas and is_ganged were computed before
dde_io_lock was taken. A concurrent zio_ddt_child_write_done() error
path calls ddt_phys_unextend() under dde_io_lock, which can zero
DVA[0] while another thread is between computing have_dvas and taking
dde_io_lock. That thread then uses the stale have_dvas=1 to call
ddt_bp_fill(), copying the zeroed DVA into the BP. A zero DVA resolves
as a hole, producing blocks that read back as zeros with no checksum
error (silent data corruption).

Fix by moving have_dvas and is_ganged computation to after dde_io_lock
is taken, so they always reflect the current state of dde->dde_phys.

Regression introduced by a41ef36858 ("DDT: Reduce global DDT lock
scope during writes").

Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>

    [3 lines not shown]
DeltaFile
+15-9module/zfs/zio.c
+15-91 files

OpenZFS/src 2f283c9include/sys zap_impl.h, module/zfs zap_fat.c zap.c

zap: remove refcount tags from backend functions

Since we now never need to unlock/lock an existing zap_t, we don't need
to thread through the refcount tag everywhere, which lets us simplify a
lot of calls.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18546
DeltaFile
+13-20module/zfs/zap_fat.c
+8-10module/zfs/zap.c
+6-9include/sys/zap_impl.h
+3-7module/zfs/zap_impl.c
+3-3module/zfs/zap_micro.c
+33-495 files

OpenZFS/src c8f9b4cinclude/sys zap_impl.h, module/zfs zap_fat.c zap_impl.c

zap: lift and simplify zap_t lock upgrade

Most fatzap write ops only take the READER zap_t lock, because the
header block only needs to be updated when a change would add or remove
a leaf block or spill the ptrtbl. When this happens, the lock is
upgraded to WRITER so those changes can be made.

If the lock can't be upgraded directly (not least because
rw_tryupgrade() is a no-op on Linux and userspace), then it has to be
dropped and re-acquired, that is, zap_unlock() and then zap_lock().

However, this method is far heavier than it needs to be, and adds
complication because it fully releases the zap_t, the header dbuf and
the dnode. This gives a window where the dbuf can be evicted and so the
zap_t destroyed. In addition to the IO overhead if this happens, this
means the zap_t returned by zap_lock() may be different to the original,
which means all callers need to be prepared for it to change.

zap_shrink() used an alternate method of simply dropping and reacquiring

    [18 lines not shown]
DeltaFile
+9-46module/zfs/zap_fat.c
+37-0module/zfs/zap_impl.c
+4-14module/zfs/zap.c
+12-0include/sys/zap_impl.h
+0-1module/zfs/zap_micro.c
+62-615 files

OpenZFS/src 18d910binclude/sys zap_impl.h, module/zfs zap_micro.c zap_impl.c

mzap_create_impl: use zap_lock_by_dnode()

The only reason this used zap_lock_impl() directly was to avoid an extra
dbuf hold, but there's no real reason to do that. Just use
zap_lock_by_dnode(), and then zap_lock_impl() can be de-exported.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18546
DeltaFile
+4-5module/zfs/zap_micro.c
+0-4include/sys/zap_impl.h
+1-1module/zfs/zap_impl.c
+5-103 files

OpenZFS/src d3523f9module/zfs zap_impl.c

zap_lock: make it be a simple wrapper around zap_lock_by_dnode()

The only real difference between zap_lock() and zap_lock_by_dnode() is
that the former takes and releases its own dnode hold. If we make it
just delegate to zap_lock_by_dnode(), then the dbuf hold and release can
be handled there, in one place.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18546
DeltaFile
+2-11module/zfs/zap_impl.c
+2-111 files

OpenZFS/src e4b0d59include/sys zap_impl.h, module/zfs zap.c zap_fat.c

zap: rename 'lockdir' to 'lock'

The "dir" part is a holdover from prehistoric times, where ZAPs were
just the filesystem directory object.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18546
DeltaFile
+69-69module/zfs/zap.c
+6-6module/zfs/zap_fat.c
+6-6module/zfs/zap_impl.c
+4-5module/zfs/zap_micro.c
+4-4include/sys/zap_impl.h
+89-905 files

OpenZFS/src f4a8b0f.github/workflows zfs-arm.yml

CI: Allow testing with a newer GCC on ARM builder

Add a text box to specify a custom GCC version (like '16') when
running the zfs-arm builder.  This allows you to test with a newer
GCC than the Ubuntu default.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #18540
DeltaFile
+37-1.github/workflows/zfs-arm.yml
+37-11 files

OpenZFS/src 2fa83c0cmd/zstream zstream_recompress.c

zstream: init/fini refcount tracking

When compiled with ZFS_DEBUG and reference_tracking_enable is enabled,
ABD alloc/free will have real refcount tracking, which will crash if the
reference cache hasn't been initialised. Adding it to the init & fini
lists is the quickest way to get that going again.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18535
DeltaFile
+2-0cmd/zstream/zstream_recompress.c
+2-01 files

OpenZFS/src 839ec56cmd/zstream zstream.c

zstream: dump backtrace on crash

Same method as zdb and ztest. zstream doesn't get touched much, and
plays a bit fast-and-loose with some core code. Its not hard for a
change to make it crash; this makes debugging easier when it does.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18535
DeltaFile
+36-0cmd/zstream/zstream.c
+36-01 files

OpenZFS/src 9e9a012.github/workflows zfs-qemu.yml zfs-qemu-packages.yml, .github/workflows/scripts qemu-2-start.sh qemu-4-build-vm.sh

CI: Remove deprecated Fedora 42

Fedora 42 was deprecated on May 13 2026.  Remove it from CI tests.

Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18545
DeltaFile
+5-5.github/workflows/zfs-qemu.yml
+0-5.github/workflows/scripts/qemu-2-start.sh
+1-1.github/workflows/zfs-qemu-packages.yml
+1-1.github/workflows/scripts/qemu-4-build-vm.sh
+7-124 files

OpenZFS/src 3800525module/zfs vdev_raidz_math_aarch64_neon_common.h

Fix aarch64 build failure by removing earlyclobber (#18532)

The UVR macros used "+&w" (read-write + earlyclobber) as the
constraint for NEON register operands that are declared as explicit
hard-register variables via:

register unsigned char wN asm("vN") __attribute__((vector_size(16)));

The + modifier implicitly makes the operand also an input (reading the
register before the asm runs). The & (earlyclobber) modifier says "this
output may be written before all inputs are consumed." Having an
earlyclobber output on the same hard-register that is simultaneously
an input is a contradiction — GCC 16 now strictly diagnoses this.

The fix removes the & from "+&w", yielding "+w". The earlyclobber
was both incorrect (contradicts the implicit input) and unnecessary
(the physical registers are already hard-bound, so the compiler has no
freedom to assign conflicting registers anyway).


    [6 lines not shown]
DeltaFile
+9-9module/zfs/vdev_raidz_math_aarch64_neon_common.h
+9-91 files

OpenZFS/src 7012b46module/zfs dsl_bookmark.c

dsl_bookmark: fix redaction list refcount tag when upgrading spill

rl_bonus and rl_dbuf are expected to have the same hold tag if they are
different. If the spill hold is taken after the redaction_list_t was
created and the bonus hold was taken, it must also be taken with the
same tag. Fortunately, we have it right here, so we can just use it.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18536
DeltaFile
+1-1module/zfs/dsl_bookmark.c
+1-11 files

OpenZFS/src edb9af3module/zfs ddt_log.c

ddt_log: fix refcount tag between ddt_log_begin & ddt_log_commit

We have to hold and release the dbuf array with the same tag. Since the
caller provides the ddt_log_update_t and is managing its lifetime, and
the begin/commit calls must be matched, it's quite reasonable to its
pointer as the refcount tag.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18536
DeltaFile
+2-2module/zfs/ddt_log.c
+2-21 files

OpenZFS/src fed1b58module/zfs zap.c

zap: fix refcount tag use in zap_lookup_length_uint64 and zap_prefetch_uint64

The same tag must be used for zap_lockdir() and zap_unlockdir(), so we have
to follow the pattern used elsewhere: pass the tag used for
zap_lockdir() through to the _impl(), so it can use it for
zap_unlockdir().

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18536
DeltaFile
+12-11module/zfs/zap.c
+12-111 files

OpenZFS/src be6b6eainclude/os/linux/spl/sys rwlock.h, module/os/linux/zfs zfs_vnops_os.c

linux: suppress reclaim lockdep in zfs_inactive via rwlock wrappers

kswapd can enter zfs_inactive() from inode reclaim while holding
fs_reclaim. The z_teardown_inactive_lock still serializes teardown,
but the reclaim-thread acquire/release pair can produce a lockdep
cycle through zfs_zinactive() and zfs_rmnode().

Add Linux rwlock nolockdep wrappers alongside the existing rwlock
macros and use them only for the reclaim-thread
z_teardown_inactive_lock acquire/release in zfs_inactive(). Keep
the real rwsem semantics unchanged and leave CONFIG_LOCKDEP
handling in the platform rwlock layer.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369 at gmail.com>
Closes #18505
DeltaFile
+100-36include/os/linux/spl/sys/rwlock.h
+24-6module/os/linux/zfs/zfs_vnops_os.c
+124-422 files

OpenZFS/src 8b24164.github/workflows/scripts qemu-3-deps.sh

CI: Fix 99.99 META version

We have an option in zfs-qemu-packages to test against a specific kernel
version.  However, qemu-3-deps.sh was incorrectly hard coded to look
at $2 for a kernel version argument (which could come in $2 or $3
depending on if --poweroff was also passed).  This caused the CI
to incorrectly edit META with a max supported kernel version of 99.99
when we didn't want that.

Fix this by looking at all the arguments for something that looks
like a kernel version and set that as the kernel max in META.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #18526
Closes #18531
DeltaFile
+13-5.github/workflows/scripts/qemu-3-deps.sh
+13-51 files

OpenZFS/src 8c3b0c7include/sys arc.h, module/zfs arc.c

Remove arc_bcopy_func() function

While this function could be convenient it appears it's never been
used.  In practice, callers end up using the arc_getbuf_func()
instead.  Remove this unused function.

Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18534
DeltaFile
+0-14module/zfs/arc.c
+1-2include/sys/arc.h
+1-162 files

OpenZFS/src 47af5e4module/zfs arc.c

arc: export additional required symbols

External consumers of arc_read() need to be able to destroy the
returned arc_buf_t.  Add the arc_buf_destroy() interface as an
exported symbol.

Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18533
DeltaFile
+1-0module/zfs/arc.c
+1-01 files

OpenZFS/src 3e57137tests/zfs-tests/tests/functional/cli_root/zhack zhack_metaslab_leak.ksh

ZTS: zhack_metaslab_leak.ksh busy export

If the pool is active 'zpool export' will fail resulting in
a test failure.  Swap log_must with log_must_busy so the export
is retried when reported as busy before failing the test.

Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18512
DeltaFile
+2-2tests/zfs-tests/tests/functional/cli_root/zhack/zhack_metaslab_leak.ksh
+2-21 files

OpenZFS/src e5bc400module/zfs vdev_raidz_math_aarch64_neon_common.h

Fix aarch64 build failure: remove earlyclobber from UVR asm constraints

Agent-Logs-Url: https://github.com/openzfs/zfs/sessions/003e5a4a-47a2-40de-a490-8a8ee8d67f5e

Co-authored-by: behlendorf <148917+behlendorf at users.noreply.github.com>
DeltaFile
+9-9module/zfs/vdev_raidz_math_aarch64_neon_common.h
+9-91 files

OpenZFS/src f5733f6tests/zfs-tests/tests/functional/dedup dedup_bclone.ksh dedup_legacy_create.ksh

Integrate DDT and BRT tests

Don't disable block cloning during dedup tests.  Just don't use
cp to not trigger it.  Add a new test, explicitly mixing dedup
and cloning on the same file, that should be handled by DDT.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18520
DeltaFile
+120-0tests/zfs-tests/tests/functional/dedup/dedup_bclone.ksh
+2-4tests/zfs-tests/tests/functional/dedup/dedup_legacy_create.ksh
+2-4tests/zfs-tests/tests/functional/dedup/dedup_fdt_create.ksh
+2-4tests/zfs-tests/tests/functional/dedup/dedup_legacy_fdt_upgrade.ksh
+1-3tests/zfs-tests/tests/functional/dedup/dedup_fdt_import.ksh
+1-3tests/zfs-tests/tests/functional/dedup/dedup_fdt_pacing.ksh
+128-185 files not shown
+135-2711 files

OpenZFS/src 181e1b5include/sys zio_impl.h, man/man8 zpool-events.8

Fix double free for blocks cloned after DDT prune

Before this change, for blocks marked with D flag but absent in DDT
(pruned from it), zio_ddt_free() fell back to ZIO_STAGE_DVA_FREE
without trying ZIO_STAGE_BRT_FREE first.  Same time such blocks
might be present in BRT, and not handling that would result in
double/multiple free.

This change makes ZIO_DDT_FREE_PIPELINE include ZIO_FREE_PIPELINE,
just adding required ZIO_STAGE_ISSUE_ASYNC and ZIO_STAGE_DDT_FREE,
and moves DDT stages before BRT.  This way, if the block is found
in DDT by zio_ddt_free(), the pipeline is short-circuited to
ZIO_INTERLOCK_PIPELINE, similar to what zio_brt_free() does.  If
not, then BRT is checked, and if also no match, the block is freed.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18520
DeltaFile
+152-0tests/zfs-tests/tests/functional/dedup/dedup_bclone_pruned.ksh
+15-8module/zfs/zio.c
+6-7include/sys/zio_impl.h
+5-5man/man8/zpool-events.8
+4-4tests/runfiles/common.run
+1-1module/zcommon/zfs_valstr.c
+183-251 files not shown
+184-257 files

OpenZFS/src 58c8dc5module/os/linux/zfs zpl_super.c

linux/zpl_super: handle 'source' option directly

vfs_parse_fs_param_source() didn't appear until 5.14, and was not
backported to kernel.org LTS kernels. It's simple enough that it's
easier to just handle it ourselves rather than use a configure check.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18529
DeltaFile
+19-10module/os/linux/zfs/zpl_super.c
+19-101 files

OpenZFS/src 532760emodule/os/linux/zfs zfs_vfsops.c

Linux: avoid znode list lock inversion during resume

Lockdep reports a circular locking dependency during mounted filesystem
rollback.  zfs_resume_fs() walks z_all_znodes under z_znodes_lock and
calls zfs_rezget(), which takes the per-object znode hold lock via
zfs_znode_hold_enter().

The normal zget path takes these locks in the opposite order.
zfs_zget() takes the per-object hold lock before zfs_znode_alloc()
inserts the znode on z_all_znodes under z_znodes_lock.  Resume can
therefore establish z_znodes_lock -> zh_lock while normal lookup
creates zh_lock -> z_znodes_lock.

Pin the current and next znodes with igrab() while holding the list
lock, then drop the list lock before reloading the znode.  Existing
stale inode handling is preserved, and both the suspended reference
and temporary walk reference are released asynchronously.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: ZhengYuan Huang <gality369 at gmail.com>
Closes #18517
DeltaFile
+39-6module/os/linux/zfs/zfs_vfsops.c
+39-61 files

OpenZFS/src 414ce4binclude/sys arc_impl.h, man/man4 zfs.4

Linux: expose zfs_arc_no_grow_shift as a module parameter

The zfs_arc_no_grow_shift variable is tunable via sysctl on FreeBSD
but had no module parameter registration on Linux.

Register it once in arc.c using param_get_uint and a per-platform
set handler, replacing the FreeBSD-only registration.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alek Pinchuk <alek.pinchuk at connectwise.com>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18461
DeltaFile
+18-0module/os/linux/zfs/arc_os.c
+9-4module/zfs/arc.c
+2-2module/os/freebsd/zfs/sysctl_os.c
+0-3module/os/freebsd/zfs/arc_os.c
+2-1include/sys/arc_impl.h
+0-2man/man4/zfs.4
+31-121 files not shown
+31-137 files

OpenZFS/src 90a1740.github/workflows/scripts qemu-2-start.sh

CI: FreeBSD 15.1 STABLE

Update the freebsd15-1s builder to the released STABLE image.

Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18524
DeltaFile
+1-1.github/workflows/scripts/qemu-2-start.sh
+1-11 files