CI/GCC: Add Fedora 44, fix build errors and threadsappend
- Add Fedora 44 to CI tests
- Fix build issues from the newer compiler. These are mostly 'char *'
to 'const char *' conversions.
- Fix threadsappend.c test waiting for the same thread TID twice.
This caused the test to hang on F44 (but strangely not other OSs?)
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #18478
Initialize vr_last_txg for rebuild
Only call txg_wait_synced() when rebuild IOs were issued for this
metaslab. This is a small optimization since in practice the first
metaslab is very likely to have allocations and cause vr_last_txg
to be initialized. After this point when processing empty metaslabs
txg_wait_synced() is called but with an already committed txg so it
will not wait. Still it's better not to call txg_wait_synced() at
all when it's not needed.
Reviewed-by: Andriy Tkachuk <atkachuk at wasabi.com>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18482
Avoid flushing unrelated NFS exports on snapshot unmount
zfsctl_snapshot_unmount() called exportfs_flush() before every umount
attempt to drop NFS export cache references that pin the snapshot
mountpoint. The flush has global effect on the host's NFS exports and
clients, so paying it on every snapshot unmount (including auto-expire
rounds for snapshots that were never NFS-accessed) impacts unrelated
snapshots and clients.
ZFS cannot invalidate individual export cache entries because the
relevant sunrpc cache APIs are exported GPL-only. Defer the global
flush so it runs only when the umount has actually failed, then retry
once. Snapshots that are not NFS-pinned succeed on the first attempt
and never trigger the flush.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Youzhong Yang <yyang at mathworks.com>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18476
Fix rare cksum errors after rebuild
Currently, after rebuild (aka sequential resilver), checksum
errors can be seen sometimes on the spare vdev or draid spare.
On my laptop, it happens from 2 to 4 times of running
redundancy_draid_spare1 test in a loop for 100 times.
It looks like there's a race in vdev_rebuild_thread() when the
rebuild of space map ranges is finished and we re-enable
allocations from the metaslab too soon: a new allocations may
happen from that metaslab before txg with the rebuilt ranges is
sync-ed, causing undesirable interference.
Solution: wait for the txg to be sync-ed before enabling metaslab.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
Signed-off-by: Andriy Tkachuk <atkachuk at wasabi.com>
Closes #18307
Closes #18319
Closes #18473
Fix off-by-one in PREVIOUSLY_REDACTED handler that drops last block
In send_reader_thread(), the PREVIOUSLY_REDACTED handler computed
file_max as MIN(dn->dn_maxblkid, range->end_blkid). dn_maxblkid is
an inclusive maximum block ID while range->end_blkid is exclusive (one
past the last block). The resulting file_max was then used as an
exclusive loop bound, causing the last block of any file (at index
dn_maxblkid) to be silently skipped when a PREVIOUSLY_REDACTED range
covered the end of the file.
The block was never written to the send stream so the receiver kept
zeros there. ZFS reported no error because the stream itself was
valid; the data was simply absent.
Fix: use dn_maxblkid + 1 so file_max is consistently exclusive.
Add a regression test (redacted_max_blkid.ksh) that modifies only the
last block of a file in one clone, creates a redaction bookmark from
it, then sends an unmodified clone incrementally from that bookmark.
[7 lines not shown]
Linux 7.1: access dentry d_alias directly
The d_u union introduced in 3.18 is now anonymous, so we need to detect
it and decide the right way to name d_alias.
Note that we used to have support for both names to support kernels
before 3.18, so this commit is effectively reverting the commit that
removed that support, efc293e371.
Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18471
ZTS: add libzfs_mnttab_cache test
This is the repro test from #18464, and confirms that when disabled, the
libzfs_mnttab_cache is discarded and reloaded on every lookup.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Co-authored-by: Prakash Surya <prakash.surya at perforce.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18466
Closes #18464
libzfs/mnttab: restore ability to enable/disable cache
In #18296 we made the cache "always on", with the justification that our
internal tools always enable the cache anyway. This allowed removing the
entire alternate implementation of libzfs_mnttab_find().
Unfortunately, it appears that there are still libzfs consumers out
there that were expecting to be able to disable the cache entirely, and
this broke some behaviour for them.
This commit restores the ability to enable or disable the cache (and
returns to "disabled" as the default, to preserve existing behaviour).
Fortunately there is no need for a whole second codepath; just a small
reorganisation to drop all cached entries each time.
Sponsored-by: TrueNAS
Reviewed-by: Prakash Surya <prakash.surya at perforce.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
[4 lines not shown]
AUTHORS: add names of recent new contributors
"Speak, friend, and enter."
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #18475
libspl/mnttab: follow symlinks when resolving path via statx (#18469)
When the path argument to "zfs list -Ho name <path>" (or any caller of
zfs_path_to_zhandle()) is a symlink that crosses a mount boundary, the
wrong dataset is returned. Instead of returning the dataset that owns
the symlink's target, getextmntent() matches the dataset containing the
symlink itself.
For example, given two ZFS datasets "tank/ds1" and "tank/ds2", and a
symlink "/tank/ds1/link" pointing into "/tank/ds2":
$ sudo zfs list -Ho name /tank/ds1/link
tank/ds1
The expected (and previous) behavior is to return "tank/ds2", since the
symlink's target resides in that dataset.
The problem is in getextmntent(), in lib/libspl/os/linux/mnttab.c. That
function calls statx() on the caller-supplied path to obtain its mnt_id
[41 lines not shown]
build: use pax tar format for make dist
Automake's default tar formats (v7 pre-1.18, ustar since) impose path
length limits that drop several long test filenames from the release
tarball when `make dist` runs. Pax format has no such limit and is
read by GNU tar 1.14+ and libarchive/bsdtar.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes: #17276
Closes: #18465
include: Remove duplicate lzc_send_space prototype
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ryan Moeller <ryan.moeller at klarasystems.com>
Closes #18463
CI: curl fallback, print killed tests, FreeBSD URL
- We've seen occasional 'ERROR 502: Bad Gateway' from the runner trying
to download an image with axel. Axel can open multiple connections for
a faster download, so maybe that's causing problems. This commit adds
in a fallback to curl if the axel download doesn't work.
- Update merge_summary.awk to print out killed tests in the summary.
We've seen cases where the summary page was red but there were no test
failures printed. This is because one of the VMs had too may
killed tests, which caused the total test time to run too long and
caused the runner to timeout qemu-6-test.sh. When the runner kills off
qemu-6-tests.sh, it means we never generate the nice summary page
for that VM listing the killed off tests. This commit parses the
partial test logs for killed off tests and includes them in the
merge_summary.awk output.
- Print an error message in the summary page if one of the VMs
didn't complete ZTS. This helps draw attention to a VM crash.
[8 lines not shown]
zfs.4: document five missing module parameters
Add entries for module parameters that are exposed via
ZFS_MODULE_PARAM but not covered in zfs.4:
zfs_active_allocator (charp, module/zfs/metaslab.c)
zfs_compressed_arc_enabled (int, module/zfs/arc.c)
zfs_arc_no_grow_shift (uint, module/os/freebsd/zfs/arc_os.c)
zfs_scan_blkstats (int, module/zfs/dsl_scan.c)
zfs_snapshot_history_enabled (int, module/zfs/dsl_dataset.c)
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18456
key lookup failure should always return EACCES
spa_do_crypt_abd() already maps a missing key to EACCES. However
spa_do_crypt_mac_abd(), spa_do_crypt_objset_mac_abd(), and
spa_crypt_get_salt() still return the raw
spa_keystore_lookup_key() error (ENOENT). This is inconsistent
As we want to treat all “no key” failures as a permission
failure. Standardize on EACCES for the unloaded-key case.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alek Pinchuk <alek.pinchuk at connectwise.com>
Closes #18448
ZTS: zpool_iostat_002_pos increase sleep time
Allow an additional second for the test to complete before checking
the results. This may explain occasional test failures in the CI.
Additionally, when the test fails dump the tmpfile for inspection.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18455
ZTS: add targeted redundancy_draid_spare exception
When sequentially resilvering a dRAID pool it's possible that a few
correctable checksum errors will be reported. This is a known issue
which is occasionally observed in the CI. Until it's resolved we
want the test case to tolerate a few checksum errors in this scenario
to prevent false positives in the CI.
This change also has the additional side effect of standardizing in
one location how the dRAID pool integrity is verified.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Issue #18307
Issue #18319
Closes #18436
Fix 'kernel BUG at mm/usercopy.c'
Fix a bug where an cgroup-OOM-killed process can cause a panic:
usercopy: Kernel memory exposure attempt detected from vmalloc (offset
1007584, size 217120)!
kernel BUG at mm/usercopy.c:102!
This was caused by zfs_uiomove() not correctly returning EFAULT
for short copies.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #15918
Closes #18408
ZTS: snapshot_018_pos.ksh add extra margin
The date(1) command and snapshot timestamps use different clock
sources which can result in a small discrepancy. This can cause
the test the incorrectly fail. To avoid this, add a brief delay
to the test case to allow for minor skew.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18450
ZTS: mmp_on_uberblocks.ksh simplify
The last portion of mmp_on_uberblocks.ksh was intended to verify that
the sequence number was incremented. However, it failed to account for
the case where a txg sync would occur resulting in the sequence number
being correctly reset.
Rather than add additional code to detect this that check has been
removed. The mmp update frequency is still verified via the kstat
which is a more reliably mechanism to detect the writes. There are
several other mmp tests which verify the uberblock changes are reflected
on disk so there's no significant loss of test coverage.
Finally, the test case has been simplified to use the within_percent
function for readability.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18452
ZTS: fix trim test portability for FreeBSD
Replace GNU-specific du flags (--block-size, -B1) and dd conv=nocreat
with POSIX compatible commands. Move -O flag before pool name in
zpool create to align with FreeBSD's strict POSIX getopt(). Relax vdev
size thresholds in trim_config to account for ZFS-on-ZFS overhead.
Add sync_pool before zpool trim -w to ensure freed blocks are committed
before trimming.
Skip zpool_trim_partial, zpool_trim_verify_trimmed, trim_config, and
autotrim_config on FreeBSD where trim does not reclaim space on file
vdevs stored on a ZFS filesystem within the test framework.
Tested on FreeBSD 16.0-CURRENT: 26 PASS, 4 SKIP, 0 FAIL.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18398
ZTS: remove outdated FreeBSD skip from trim tests
FreeBSD has supported hole punching via fspacectl(2) since
FreeBSD 14.0 and the test library already handles this using
truncate -d. Remove the skip that prevented trim tests from
running on FreeBSD.
Tests will still skip if the hardware does not support
TRIM/UNMAP, which is checked separately via diskinfo.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18398
ZTS: zpool_export_parallel_admin.sh busy export
If the pool is active 'zpool export' will fail resulting in
a test failure. Swap log_must with log_must_busy so the export
is retried when reported as busy before failing the test.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18447
Fix: draid autopkgtests fail on s390x architecture (Endianness Issue)
The ioctl call to create the pool was returning -1 with errno EINVAL.
Inside the module code, inside vdev_draid.c, verify_perms is calling
fletcher_4_native_varsize. This in turn calls fletcher_4_scalar_native.
So, implemented a fletcher_4_byteswap_varsize which makes use of the
fletcher_4_scalar_byteswap in Big endian machines.
Reviewed-by: Andriy Tkachuk <andriy.tkachuk at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Pranav P <pranavsdreams at gmail.com>
Closes #16261
Closes #18445
Fix "panic: cache_vop_rename: lingering negative entry"
A FreeBSD ZFS filesystem with properties "utf8only=on" and
"normalization=formD" consistently produces this panic when
building the lang/perl-5.42.0 port.
A ZFS file system with "utf8only=off" and "normalization=none"
works fine.
The cause of the panic seems to be incorrectly using the FreeBSD
namecache when normalisation is present. This commit adds a
predicate to prevent that.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Jan Martin Mikkelsen <janm-github at transactionware.com>
Closes #18430
Handle raidz errors <= nparity rather than ignoring
This PR adds a check in the mirror and raidz code for the case where
there are errors <= nparity. In that case, ZFS sets a new flag on
the zio that will be checked in zio_done. If that flag is set, when
the write IO completes, we issue a read IO for the same blkptr.
That will allow ZFS's auto-healing mechanisms and other errors
recovery tools to detect the effectively-corrupt data, and handle
it accordingly. Note that because draid raidz's IO done function,
it also benefits from this functionality.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie at klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #18387
CI: Add more debugging to qemu-1-setup.sh
- Remove line where we disable stdout at the end of qemu-1-setup.sh
- Fix comment switching the 2x75GB -> 1x150GB cases
- Add some more debug to the end of the script
Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #18441
dmu_direct: avoid UAF in dmu_write_direct_done()
dmu_write_direct_done() passes dmu_sync_arg_t to
dmu_sync_done(), which updates the override state and
frees the completion context. The Direct I/O error path
then still dereferences dsa->dsa_tx while rolling the
dirty record back with dbuf_undirty(), resulting in a
use-after-free.
Save dsa->dsa_tx in a local variable before calling
dmu_sync_done() and use that saved tx for the error
rollback. This preserves the existing ownership model
for dsa and does not change the Direct I/O write
semantics.
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Co-authored-by: gality369 <gality369 at example.com>
Signed-off-by: ZhengYuan Huang <gality369 at gmail.com>
Closes #18440