zbookmark_compare: handle "marker" bookmarks with negative levels
"Marker" bookmarks (those with zb_level == ZB_ROOT_LEVEL, ZB_ZIL_LEVEL
or ZB_DNODE_LEVEL) represent valid blocks, but are associated with a
dataset directly rather than with a specific object within it. They end
up on bookmark lists during scan prefetch, and so need to be sorted
ahead of any "true" object blocks.
The problem is that for negative levels, BP_SPANB produces a negative
shift, which is not legal C. Fortunately the results are used only for
comparison, so the worst possible behaviour in a forgiving compilation
environment is a mis-sort, which for the scan/traverse cases, means that
we haven't prefetched certain metadata before we actually need it. But
there _is_ UB in there, and UBSAN does rightly complain.
Here we fix all this by handling these bookmarks directly - sorting them
ahead of "true" object blocks, which is usually what scan/traverse will
prefer. And we don't do any interesting math on these bookmarks, so we
sidestep the whole UB thing.
[6 lines not shown]
Calling thread IO
Adds a module parameter that will allow waiting for bio's
to complete, along with a flag that tracks whether a zio
has bypassed the queue.
The motivation behind this change was performance based. The
intention was to reduce overhead caused by swapping between
threads from when bio's are submitted, and the callback executes.
Currently, only zio's who have bypassed the queue are allowed
to wait for bio completion, this is mainly done because any performance
uplift from staying in the same thread is overshadowed by the vdev
queue lock.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Migel Imeri <mimeri at lanl.gov>
Closes #18562
Fix insufficient locking in dedup verify
Introduction of dde_io_lock removed global DDT lock acquisition
from write completion. As result, white ZIO ABD could be freed
while zio_ddt_collision() is comparing against it. Taking there
dde_io_lock should fix the issue.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #17960
Closes #18712
Closes #18720
zpl_ctldir: remove comments describing ancient kernel behaviour
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: move the invalid dentry check up to zpl_snapdir_automount()
If the dentry is invalid, don't even bother calling
zfsctl_snapshot_mount(). There's no practical change here, but it just
helps keep the notion of "invalidated dentry" in the binding.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: remove flags arg to zfsctl_snapshot_mount()
Always set to 0, and never read anyway.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: remove flags arg to zfsctl_snapshot_unmount()
On FreeBSD, its ignored. On Linux, it's set to either MNT_EXPIRE or
MNT_FORCE, with MNT_FORCE adding the -f switch to the userspace
unmount(8) call. However, MNT_FORCE to umount(2) simply causes an early
call into sb->umount_begin() early in the unmount process, which we do
not implement. Therefore, it is effectively a no-op, and we can remove
it.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: remove delay param for zfsctl_snapshot_unmount_delay()
It's always set to the zfs_expire_snapshot tunable and never changed.
There's no need to thread it through.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: use dmu_objset_spa() to get spa pointer
Just for slightly easier readability against dmu_objset_id(), which is
often right near it.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: remove unused args to zfsctl_snapshot_alloc()
Since 4ce030e025 (2025) these have always been null/zero, which those
fields already are, so there's no need for them.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
zfs_ctldir: remove se_root_dentry
Unused field since 9b77d1c958 (2017).
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18722
ZTS: snapdir: ensure mounts only occur when accessing beyond the snapdir
On Linux, automount only occurs for paths that are "beyond" the snapdir.
Accessing the snapdir itself eg with `stat()` does not itself trigger
the automount. Confirm that this is the case.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapdir: ensure the SNAPSHOT_NO_SUID tunable performs correctly
When set, zfs_snapshot_no_setuid will add the nosuid option to new
snapdir mounts, preventing setuid executables from being run as a
different user.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapdir: test that snapdir mounts are expired when idle
A snapdir mount that has not been used for some time should be
automatically unmounted. Test that that happens, and also that accessing
the mount resets the timer.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapdir: test that explicit unmount allows new automount
It's always possible to manually unmount a snapdir mount. When that
happens, the next access should mount the snapshot again, even though
the snapmount system may have no knowledge that the unmount actually
happened.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapdir: test ADMIN_SNAPSHOT=0 prevents snapdir admin ops
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapshot: remove need for snapdir admin feature in snapshot tests
The only reason it was enabled/used was to remove the snapshot automount
dirs during cleanup. Those are already removed when the snapshot is
destroyed, so it doesn't need to be done at all.
Disabling it also helps to ensure that we aren't accidentally using it
or relying on side-effects in the snapshot tests proper. Those effects
should be tested separately in a snapdir test.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapdir: limit ADMIN_SNAPSHOT tunable use to the tests that need it
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: snapdir: cleanup
Rename tests to match their group and function, and removed unused
config vars.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ZTS: create snapdir test group, move relevant tests from snapshot group
The intent of the snapdir group is to test the behaviour of the snapshot
automount and admin facility itself. The snapshot group is left to test
the behaviour of snapshots and the data within, without worrying about
the behaviour of snapdirs beyond them working to provide access to the
snapshot data via the filesystem.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18705
ABI: bump for spacemap condense
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #16747
zts: add test for log spacemap flushall + zpool condense
Sponsored-by: TrueNAS
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #16747
ztest: periodically start/stop log spacemap flush
Sponsored-by: TrueNAS
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #16747
log_spacemap: extend pool flushall to have "request" and "export" modes
Normally, log spacemaps are flushed out to the metaslabs when the pool
is exported. For large logs, this can lead to export taking an
inordinate amount of time.
This commit adds a "mode" parameter for the log spacemap "flushall"
operation, and functions for starting and stopping it in a particular
mode. The existing behaviour of flushing everything is now the "export"
mode.
Then, we add a new "request" mode, that can be triggered externally.
This activates the same flushall code, with a few differences:
- we only consider flushing metaslabs that were dirtied on the txg
before the flushall operation was started
- we close and issue the txg immediately when the flushall is active,
rather than wait for zfs_txg_timeout each time (similar to how scrub
[18 lines not shown]
log_spacemap: add counter for unflushed metaslabs
Useful for understanding, but also a convenient place to grab the
current count without needing to walk over the metaslab list.
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #16747
spa_stats: kstats for unflushed log spacemaps
Sponsored-by: TrueNAS
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #16747
zts: zpool-condense sanity tests
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #16747
zpool: add condense verb
The idea is to have a single command that could signal to any background
cleanup task that it should do its work faster, or care less about not
getting in the way of user IO, or whatever.
This adds the the `zpool condense` command, the `ZFS_IOC_POOL_CONDENSE`
ioctl and counters so userspace can get progress. Because the type could
be anything, there's no particular unit, just a total number of items to
condense and count of how many done.
Included is a "debug" condense type. In debug builds, issuing condense
with this type will start a background process that will simply bump the
condense counters every second for ten seconds. This is intended for use
by the test suite and for debugging the condense infrastructure itself,
and will be compiled out production builds.
Sponsored-by: TrueNAS
Sponsored-by: Klara, Inc.
[5 lines not shown]
ZTS: delegate: test send:encrypted
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
(cherry picked from commit 166a6672502c6398b3ef549d4a17f113f5cb2e8d)
ZTS: delegate: check send permissions on encrypted datasets
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
(cherry picked from commit bce9a8ef7d0b4c1b8e323ef2b045379177c20410)