unit/zap: add MT_MATCH_CASE mixed-case test
New case-normalization test to cover the exact-case path. On a TOUPPER
ZAP, MT_NORMALIZE | MT_MATCH_CASE matches only the stored casing, while
an MT_NORMALIZE lookup matches any case.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18638
zstream: refactor common functions
### Motivation
In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.
This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.
### Overview
This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:
[100 lines not shown]
freebsd: set mnt_time on the rootfs at mountroot time
FreeBSD's vfs_mountroot() will collect `mnt_time` from every filesystem
that we mounted and use the highest timestamp as a source for the system
time if we didn't get anything from an attached RTC.
Use the rrd mechanism added to gather up a notion of the latest time
and set it on mnt_time. If the timestamp db is empty, we just fallback
to the uberblock timestamp and hope that that is in the right ballpark.
Relevant: FreeBSD PR254058[0] reporting the problem downstream
[0] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254058
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Chris Longros <chris.longros at gmail.com>
Signed-off-by: Kyle Evans <kevans at FreeBSD.org>
Closes #18645
Add dbrrd_latest_time() to grab the latest timestamp in the db
Returns 0 if the database is empty, otherwise it returns the highest
value of the minutely db. dbrrd_add() will already enforce the property
that these are monotonically increasing, so we won't try to second-guess
it.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Chris Longros <chris.longros at gmail.com>
Signed-off-by: Kyle Evans <kevans at FreeBSD.org>
Closes #18645
Constify some rrd_*() functions
These don't modify the db, so just constify them while we're in the
area.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Chris Longros <chris.longros at gmail.com>
Signed-off-by: Kyle Evans <kevans at FreeBSD.org>
Closes #18645
RAIDZ: Optimize single data column writes
When a row contains only a single data column (one ashift-sized
block or 2-wide RAIDZ), P = Q = R = data mathematically. In this
case point all parity column ABDs at the data column ABD, skipping
both buffer allocation and parity generation.
It might be not very efficient to write so small blocks on RAIDZ,
but it is allowed and does happen. Skipping this allocation and
memory copy saves several percents of CPU time.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18695
Optimize metaslab_set_selected_txg()
I don't think it makes much sense to choose for eviction between
metaslabs selected in the same TXG. Considering that we also don't
evict them for at least 32 TXG, the difference should be in a noise.
Just skip the metaslab bumping if we already done it in this TXG.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18669
FreeBSD: avoid lookup overhead for nonexistent xattr directories
Port the z_xattr_dir_absent cache to FreeBSD. As on Linux, a
getextattr that misses in the SA otherwise falls through to
zfs_get_xattrdir(), which takes the "" ZXATTR dirlock and reads
SA_ZPL_XATTR only to find the file has no xattr directory. The
in-core znode now caches that: after an SA miss zfs_getextattr_impl()
skips the directory lookup when the flag is set, zfs_get_xattrdir()
sets it when no directory is found and clears it when one is found,
and zfs_make_xattrdir() clears it on creation, which also covers the
TX_MKXATTR ZIL replay path.
The flag is serialized by the base file's vnode lock.
zfs_make_xattrdir(), the only path that creates the directory and
clears the flag, runs with the vnode held exclusive, while every
reader that sets the flag holds it shared, so a set can never race
the clear. ASSERT_VOP_ELOCKED() in zfs_make_xattrdir() and
ASSERT_VOP_LOCKED() in zfs_get_xattrdir() enforce this, both skipped
during ZIL replay since it is single threaded with no locked vnode.
[8 lines not shown]
Avoid lookup overhead for nonexistent xattr directories
A getxattr that misses in the file's SA falls through to
zfs_get_xattrdir(), which takes the "" ZXATTR dirlock and issues an
sa_lookup(SA_ZPL_XATTR), only to find the file has no xattr directory at
all. security.capability is the common trigger: the kernel probes it on
file access (get_vfs_caps_from_disk()), so for the many files that carry
no extended attributes the same fruitless lookup repeats constantly.
Profiling an SMB metadata workload showed roughly 6% of CPU spent in
zfs_get_xattrdir(), every call missing and returning ENOENT.
Cache the result in the in-core znode: a new boolean marks a file as
having no xattr directory. When it is set, a getxattr that misses in the
SA returns ENODATA from __zpl_xattr_get() without the zfs_lookup into
zfs_get_xattrdir, so neither the "" ZXATTR dirlock nor the SA_ZPL_XATTR
lookup runs. The flag is set when the directory lookup finds nothing and
cleared in zfs_make_xattrdir() whenever a directory is created, so the
setxattr and TX_MKXATTR ZIL replay paths are both covered. It is updated
under the existing z_xattr_lock and defaults to the real lookup, so
[9 lines not shown]
Improve performance of "zpool offline" for log devices
When offlining a log device, if it's part of a mirror that would still
be available after the offline operation, skip replaying the ZIL for
every dataset. This drastically improves the performance of "zpool
offline" for one log device of a mirrored pair.
Sponsored by: ConnectWise
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alek Pinchuk <alek.pinchuk at connectwise.com>
Signed-off-by: Alan Somers <asomers at gmail.com>
Closes #18664
honor file argument in file_wait_event
grep the log path passed by the caller instead of always using
ZED_DEBUG_LOG.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alek Pinchuk <Alek.Pinchuk at connectwise.com>
Closes #18700
delegate: add 'send:encrypted' permission
send:encrypted is like send:raw, but only permits encrypted datasets to
be sent - raw send is not permitted for unencrypted datasets.
This commit creates the permission, wires it up, and adds the check for
it in zfs_secpolicy_send_impl(), if it is the last send permission
standing, the dataset is checked for its encryption state.
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
zfs_secpolicy_send: lift checks to common function for both
The permissions checks for send are a little involved because different
permissions grant different abilities, and there's two ways to initiate
a send.
This lifts the common permissions checks into a single function, and
ensures that we maintain a single dataset hold across all checks. This
will become important in the next commit when we need to check a
specific dataset property as part of the permission check.
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
ZTS: delegate: test send:encrypted
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
ZTS: delegate: check send permissions on encrypted datasets
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
ZTS: delegate: add encryption option for test fixture datasets
The delegate test framework doesn't care about the encryption status of
the dataset under test, so by adding an option to create with encryption
the framework can be used to check encryption-related permissions
without any further fanfare.
Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18673
README: update supported FreeBSD release to 15.1
Our CI runners moved to FreeBSD 15.1 in 0a4b59765 (#18667), but the
README still lists 15.0. Update it to match the CI version.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18696
Clean up embedded slog metaslab across txgs
On a read-write import, metaslab_set_fragmentation() can dirty a
metaslab via vdev_dirty() while still in the txg==0 load path when its
space map has an unexpected bonus size (e.g. a makefs-created pool
whose space-map dnodes use the boot loader's 24-byte space_map_phys_t
with nblkptr=3, giving db_size=64). If that metaslab is then selected
as the embedded slog, vdev_metaslab_init() only removed it from
vdev_ms_list when txg != 0, so the txg==0 case left it queued and
metaslab_fini() tripped VERIFY(!txg_list_member(&vd->vdev_ms_list,
msp, t)).
Remove slog_ms from the dirty list for every TXG_SIZE slot before
metaslab_fini() so the cleanup is correct regardless of txg.
Reported on FreeBSD as PR 281520:
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281520
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
[2 lines not shown]
initramfs-zfs should not try to copy directories
We had find only return files from the beginning for libgcc.so, but not
libfetch/libcurl. This oversight affected a user when vmware installed
its own libcurl.so.4 in a directory called libcurl.so.4, since our code
then tried to copy a directory, which fails.
Reviewed-by: Chris Longros <chris.longros at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Suggested-by: Carsten Härle <carsten.haerle at straightec.de>
Signed-off-by: Richard Yao <richard at ryao.dev>
Closes #18582
Closes #18686
ZTS: remove send_delegation tests
These tests are doing the same tests as delegate/zfs_allow_send, and are
hard to follow and maintain. There's no need for them now, so drop them.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18672
ZTS: delegate: add test for send sub-permissions
Regular send and raw send are actually separate operations with separate
permissions. This adds a test to test the combinations properly using
the existing permission test infrastructure.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18672
CI: Add a unit-tests workflow to our infrastructure
Run `make unit` on each PR so the unit-test suite (currently 64
tests) is tested as it grows.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18670
CI: Re-allow workflow_dispatch on zfs-qemu
Allow zfs-qemu to be invoked from a workflow_dispatch event (a.k.a,
manually running a workflow). This may have been accidentally disabled
in 1916c2c55.
Reviewed-by: Chris Longros <chris.longros at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #18680
zfs_ioctl: fix EBUSY race between quota queries and mount
zfsvfs_hold() fell back to zfsvfs_create() -> dmu_objset_own()
(exclusive) for unmounted datasets. A concurrent zfs_domount()
also calls dmu_objset_own(), causing EBUSY on the same dataset.
Introduce zfsvfs_create_hold() using dmu_objset_hold() (shared
hold) instead. Shared holds do not conflict with exclusive owns,
eliminating the race. The release path (zfsvfs_rele,
zfsvfs_create_impl error) uses dmu_objset_ds()->ds_owner to
determine whether to disown or rele, avoiding the need for an
extra flag in zfsvfs_t.
Added tests userspace_005, groupspace_005, projectspace_006
(50 iter race test).
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: HeonJe Lee <lhjnano at gmail.com>
Closes #18611
Fix handling of _PC_HAS_HIDDENSYSTEM for FreeBSD
The hidden and system flags are only supported for
ZFS pools if the z_use_fuids is true. Fix
zfs_freebsd_pathconf() to check this.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Rick Macklem <rmacklem at uoguelph.ca>
Closes #18688
spa: make ccw_retry_interval tunable on Linux (#18681)
zfs_ccw_retry_interval sets the time interval after which a retry of a
failed write of the configuration cache file is attempted. It was only
exposed on FreeBSD. Make it Linux tunable with ZFS_MODULE_PARAM and
document it in zfs.4.
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Richard Yao <richard at ryao.dev>
Linux 7.1 compat: META (#18682)
Update the META file to reflect compatibility with the 7.1
kernel.
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Reviewed-by: Chris Longros <chris.longros at gmail.com>
Update our CI runners to the newest FreeBSD 15.1 RELEASE (#18667)
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
CI: Have zfs-build-packages workflow build tarballs on Alma (#18662)
Previously, zfs-build-packages would only build source tarballs
on Fedora due to problems with building them on RHEL 7. That's
a relic of the past now, as we haven't supported RHEL 7 since
it went EOL in 2024. With this change, we now build the tarballs
on both Alma and Fedora.
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Chris Longros <chris.longros at gmail.com>
abd: Fix stats asymmetry in case of Direct I/O
abd_alloc_from_pages() does not call abd_update_scatter_stats(),
since memory is not really allocated there. But abd_free_scatter()
called by abd_free() does. It causes negative overflow of some
ABD and possibly ARC counters.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at truenas.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18390