man: Update zpool-event subclass names and document new types
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Quartz <yyhran at 163.com>
Closes #17868
ZTS: autotrim_config.ksh is missing pool type
functional/trim tests do create pools of different types to test
trim, autotrim_config.ksh is missing the type from zpool
create command line while we are looping over different pool
types.
Sponsored-by: Edgecast Cloud LLC.
Signed-off-by: Toomas Soome <tsoome at me.com>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #17874
FreeBSD zio_crypt.c: initialize uio variables before access
In zio_crypt_key_wrap and zio_crypt_key_unwrap, the cuio_s variable was
not initialized before the calls to zfs_uio_init, leading to
uninitialized access to cuio_s.uio_offset. Initialize it to avoid gcc
warnings.
Similar issue as fixed in 2bf152021 ("Fix gcc uninitialized warning in
FreeBSD zio_crypt.c")
Signed-off-by: Ryan Libby <rlibby at FreeBSD.org>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #17863
mailmap/AUTHORS: update with recent new contributors
We’re not always on the same page, but at least we’re in the same book.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17860
zpool: fix conflict with -v and -o options
Right now, the -v and -o options for `zpool list` work independently,
but when paired, the -v "wins out" and the -o effect is lost. This
commit fixes that problem.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2 at gmail.com>
Closes #11040
Closes #17839
ZTS: fail test run if test runner crashes unexpectedly
zfs-tests.sh executes test-runner.py to do the actual test work. Any
exit code < 4 is interpreted as success, with the actual value
describing the outcome of the tests inside.
If a Python program crashes in some way (eg an uncaught exception), the
process exit code is 1.
Taken together, this means that test-runner.py can crash during setup,
but return a "success" error code to zfs-tests.sh, which will report and
exit 0. This in turn causes the CI runner to believe the test run
completed successfully.
This commit addresses this by making zfs-tests.sh interpret an exit code
of 255 as a failure in the runner itself. Then, in test-runner.py, the
"fail()" function defaults to a 255 return, and the main function gets
wrapped in a generic exception handler, which prints it and calls
fail().
[10 lines not shown]
FreeBSD: zfs_getpages: Don't zero freshly allocated pages
Initially, `zfs_getpages()` is provided with an array of busy pages by
the vnode pager. It then tries to acquire the range lock, but if there
is a concurrent `zfs_write()` running and fails to acquire that range
lock, it "unbusies" the pages to avoid a deadlock with `zfs_write()`.
After that, it grabs the pages again and retries to acquire the range
lock, and so on.
Once it got the range lock, it filters out valid pages, then copy DMU
data to the remaining invalid pages.
The problem is that freshly allocated zero'd pages it grabbed itself are
marked as valid. Therefore they are skipped by the second part of the
function and DMU data is never copied to these pages. This causes mapped
pages to contain zeros instead of the expected file content.
This was discovered while working on RabbitMQ on FreeBSD. I could
reproduce the problem easily with the following commands:
[21 lines not shown]
Linux 6.18: replace write_cache_pages()
Linux 6.18 removed write_cache_pages() without a usable replacement.
Here we implement a minimal zpl_write_cache_pages() that find the dirty
pages within the mapping, gets them into the expected state and hands
them off to zfs_putpage(), which handles the rest.
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn at despairlabs.com>
CI: Fix FreeBSD 15.0 by staying on ALPHA4 due to broken ALPHA5 image
FreeBSD 15.0-ALPHA5 image fails to boot on cloud VMs due to missing
/boot/efi mount point, causing the system to drop to single user mode
where SSH cannot start. Work around this by staying on ALPHA4 and
setting IGNORE_OSVERSION=yes to bypass pkg's kernel version mismatch
prompt during bootstrap. This allows CI to proceed with ALPHA4 until we
have a stable FreeBSD 15.0 image.
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #17846
pool_iter_refresh: don't refresh pools twice
In "all pools" mode, pool_iter_refresh() will call zpool_iter(), which
will call zpool_refresh_stats() before calling add_pool(). If we already
have the pool, this is a different handle, so we just release it and
return. Back in pool_iter_refresh(), we then call zpool_stats_refresh()
again for our handle on the same pool.
All together, this means we're doing two ZFS_IOC_POOL_STATS calls into
the kernel for every pool in the system. This isn't wrong, but it does
double the pressure on global locks.
Instead, we add a new function zpool_refresh_stats_from_handle() that
simply copies the pool config and state from one handle to another, and
use it to update our handle before we release it in add_pool(), so we
only have one call per pool per interval.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
[4 lines not shown]
zinject: Introduce ready delay fault injection
This adds a pause to the ZIO pipeline in the ready stage for
matching I/O (data, dnode, or raw bookmark).
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
Signed-off-by: Robert Evans <evansr at google.com>
Closes #17787
docs: fix a few small typos (#17804)
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2 at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Explicit set ashift for non-leaf vdevs
Before this change ashift property was applied only to a leaf
vdevs. As result, it worked only as a minimal value for parent
vdevs, since bigger physical_ashift value reported by any child
could be used instead when deciding parent's ashift, as if the
ashift property was never set.
This change explicitly passes ZPOOL_CONFIG_ASHIFT to all vdevs,
allowing override for parents only if the passed value is below
logical_ashift and so unacceptable.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #17826
CI: Switch FreeBSD 15 to 15.0-ALPHA4 and add FreeBSD 16
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs at mcmilk.de>
Closes #17815
zdb: adjust block histogram binning strategy
Previously, a bin included all blocks _starting_ from given size
(e.g., a "4K" bin would include all blocks within the [4K; 8K) region).
This is counter-intuitive and does not match the typical use-case of the
block histogram (that is, to estimate disk usage considering how ZFS'
block allocation works). In other words, if I'm looking at the "4K" row,
I'm interested in records that _fit into_ a 4K block.
Adjust the binning strategy such that a bin includes all blocks _up to_
given size, such that e.g. a "4K" bin would include all blocks within
the (2K; 4K] region.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx at intelfx.name>
Closes #16999
zdb: factor out block histogram bin number computation
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx at intelfx.name>
Closes #16999
Linux 6.18: generic_drop_inode() and generic_delete_inode() renamed
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn at despairlabs.com>
Linux 6.18: convert ida_simple_* calls
ida_simple_get() and ida_simple_remove() are removed in 6.18. However,
since 4.19 they have been simple wrappers around ida_alloc() and
ida_free(), so we can just use those directly.
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn at despairlabs.com>
Fix return value for setting zvol threading
We must return -1 instead of ENOENT if the special zvol threading
property set function can't locate the dataset (this would typically
happen with an encypted and unmounted zvol) so that the operation
gets inserted properly into the nvlist for operations to set. This
is because we want the property to be set once the zvol is
decrypted again.
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Andrew Walker <awalker at ixsystems.com>
Closes #17836
Update device removal documentation
Make a minor update to the 'zpool remove' man page to clarify both
raidz and draid pools do not support removal, and change sector to
ashift which is what we actually care about.
Update the big theory comment in vdev_removal.c to accurately reflect
which types of vdevs can be removed. Furthermore, I've added some
discussion for the casual reader to briefly explain the top-level
vdev removal restrictions. This has been a common area of confusion
and it's not intuitive where they come from without understanding
the implementation details.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #17847
Fix the type of the raidz_outlier_check_interval_ms parameter
It's an hrtime_t, which is an unsigned long long. In practice this is
just a U64.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Mark Johnston <markj at FreeBSD.org>
Closes #17833
Add missing include statement
Resolve a build failure for user applications that include <sys/uio.h>.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2 at gmail.com>
Closes #17781
Closes #17814
zvol: verify IO type is supported
ZVOLs don't support all block layer IO request types. Add a check for
the IO types we do support. Also, remove references to
io_is_secure_erase() since they are not supported on ZVOLs.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #17803
pool_iter_refresh: don't flag existing pools as refreshed
zpool_iter() passes the callback a new instance of zpool_handle_t each
time, so the existing handle in the pool_list AVL never actually gets a
refresh. Internally, that means its zpool_config is never updated, and
the old config is never moved to zpool_old_config. As a result,
print_iostat() never sees any updated config, and so repeats the first
line forever.
This is the simplest workaround: just don't mark existing pools as
refreshed. pool_list_refresh() will see this and refresh them.
The downside is a second call to ZFS_IOC_POOL_STATS for existing pools,
because zpool_iter() just called it for the handle we threw away.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
Closes #17807
Suppress some ashift warnings
Do not warn about vdev ashifts being smaller then physical ashifts
in a pool status if the pool ashift property set and vdev ashift
satisfies it (bigger or equal), since user explicitly requested
this. The ashift of individual vdevs are still reported.
Do not warn about vdev ashifts in zpool import, since it doesn't
matter much, and we don't even report individual vdevs ashifts
there.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Allan Jude <allan at klarasystems.com>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #17830
Annotate arc_buf_is_shared as __maybe_unused
Otherwise the compiler warns about it on production FreeBSD builds.
The routine proved resilient to attempts to ifdef on debug.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Mateusz Guzik <mjguzik at gmail.com>
Closes #17818
zdb: add `--class=(normal|special|...)` to filter blocks by alloc class
When counting blocks to generate block size histograms (`-bb`), accept a
`--class=` argument (as a comma-separated list of either "normal",
"special", "dedup" or "other") to only consider blocks that belong to
these metaslab classes.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx at intelfx.name>
Closes #16999
sha256_generic: make internal functions a little more private
Linux 6.18 has conflicting prototypes for various sha256_* and sha512_*
functions, which we get through a very long include chain. That's tough
to fix right now; easier is just to rename our internal functions.
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn at despairlabs.com>