Linux build: always use objtool
We silence `objtool` warnings on some object files using
`OBJECT_FILES_NON_STANDARD_some_file.o`. Nowadays `objtool` is
needed for CPU vulnerability mitigations and a lot more
functionality so its use is desirable.
Just remove the `OBJECT_FILES_NON_STANDARD` definitions. A follow-up
commit is needed to make the offending files standard and address
the compile time warnings.
Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Attila Fülöp <attila at fueloep.org>
Closes #17401
Closes #17364
ZVOL: Make zvol_inhibit_dev module parameter platform-independent
The module parameter now is represented in FreeBSD sysctls list with
name: 'vfs.zfs.vol.inhibit_dev'. The default value is '0', same as on
Linux side.
Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by: Fedor Uporov <fuporov.vstack at gmail.com>
Closes #17384
FreeBSD: Add posix_fadvise(POSIX_FADV_WILLNEED) support
As commit 320f0c6 did for Linux, connect POSIX_FADV_WILLNEED
up to dmu_prefetch() on FreeBSD.
While there, fix portability problems in tests/functional/fadvise.
1. Instead of relying on the numerical values of POSIX_FADV_XXX macros,
accept macro names as arguments to the file_fadvise program. (The
numbers happen to match on Linux and FreeBSD, but future systems may
vary and it seems a little strange/raw to count on that.)
2. For implementation reasons, SEQUENTIAL doesn't reach ZFS via FreeBSD
VFS currently (perhaps something that should be investigated in
FreeBSD). Since on Linux we're treating SEQUENTIAL and WILLNEED the
same, it doesn't really matter which one we use, so switch the test
over to WILLNEED exercise the new prefetch code on both OSes the
same way.
[6 lines not shown]
tunables: remove direct use of module_param_cb
The use for spl_taskq_kick was the only use, and the comment that
module_param_call is obsolete is no longer true - it's still very much
used even in recent kernels.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: ensure tunable and variable have same define gate
If a variable is only available in the kernel, then the tunable should
also only be available there.
This matters very little so long as we don't have userspace tunables,
but its still good hygeine.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: fix spelling
Three occurences with an 'e', and all of them mine. Maybe it's an
British thing?
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: don't assert initialisation in impl getters
It actually doesn't matter if it's not initialised when we first query
the current value; it just returns empty-string. A crash is quite
obnoxious even if it is a rare case.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: remove unused param get/set aliases
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: use Linux ullong param ops for u64
Since 3.17 Linux has provided param ops for 64-bit ints, so we don't
need to use our own anymore.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: remove support for s64 tunables
Nothing uses them now.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: remove __check_old_set_param workaround
This was fully removed from Linux in 4.15, so we won't be seeing it
again.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
tunables: remove FreeBSD compat macros for Linux module params
Nothing in any FreeBSD code uses them.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
zfs_log: make zfs_immediate_write_sz uint
Likely it's only int64 for comparison with ssize_t, which is signed.
However, it would make no sense for it to be less than 0 or greater than
4G, so making it a regular uint will make it safe for comparison and
remove the only S64 tunable in core.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #17377
Linux 6.15 compat: META
Update the META file to reflect compatibility with the 6.15
kernel.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #17393
arcstat: prevent ZeroDivisionError when L2ARC becomes empty
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Yao <richard at ryao.dev>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #17348
(cherry picked from commit f0baaa329ab2f28773b9a082d74cbb773a306a53)
Linux: Stop using NR_FILE_PAGES for ARC scaling
I've found that QEMU/KVM guest memory accounted as shared also
included into NR_FILE_PAGES. But it is actually a non-evictable
anonymous memory. Using it as a base for zfs_arc_pc_percent
parameter makes ARC to ignore shrinker requests while page cache
does not really have anything to evict, ending up in OOM killer
killing the QEMU process.
Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent
the part of a page cache that is actually evictable, which should
be safer to use as a reference for ARC scaling.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Pavel Snajdr <snajpa at snajpa.net>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #17334
(cherry picked from commit 0aa83dce99e47ccd533be24b82332268766b68db)
spa: clear checkpoint information during retry
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Mariusz Zaborski <oshogbo at FreeBSD.org>
Closes #17319
(cherry picked from commit 8b9c4e643b6900829f3ba9bcaab8412d2e26895c)
Sort the blocking snapshots list #12751 (#17264)
When multiple snapshots prevent the destruction/rollback of the
respective dataset/snapshot/volume via zfs destroy or zfs rollback,
the error message does not list the blocking snapshots sorted
according to their order of creation. This causes inconvenience and can
lead to confusion, and also creates a contrast with a returned message
from zfs list -t snap function.
Closes: #12751
Signed-off-by: Artem-OSSRevival <artem.vlasenko at ossrevival.org>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
(cherry picked from commit 27f3d94940490d891c70e0c148f80d0c0ce09ed4)
ZAP: Reduce leaf array and free chunks fragmentation
Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order. Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order. Together this created a mess in the free list, making
following allocations much more fragmented than necessary.
This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().
With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively. Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.
[5 lines not shown]
Update 69-vdev.rules.in
Add support to alias md-type devices in udev rules.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Andres <a-d-j-i at users.noreply.github.com>
Closes #17345
(cherry picked from commit a6f20250de7f069da4c216dd717820058a02cd85)
icp: Use explicit_memset() exclusively in gcm_clear_ctx()
d634d20d1be31dfa8cf06ef2dc96285baf81a2fb had been intended to fix a
potential information leak issue where the compiler's optimization
passes appeared to remove `memset()` operations that sanitize sensitive
data before memory is freed for use by the rest of the kernel.
When I wrote it, I had assumed that the compiler would not remove the
other `memset()` operations, but upon reflection, I have realized that
this was a bad assumption to make. I would rather have a very slight
amount of additional overhead when calling `gcm_clear_ctx()` than risk a
future compiler remove `memset()` calls. This is likely to happen if
someone decides to try doing link time optimization and the person will
not think to audit the assembly output for issues like this, so it is
best to preempt the possibility before it happens.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
[4 lines not shown]
runners: Add option to install custom kernel on Fedora
Allow installing a custom kernel version from the Fedora experimental
kernel repos onto the github runners. This is useful for testing if
ZFS works against a newer kernel.
Fedora has a number of repos with experimental kernel packages. This
PR allows installs from kernels in these repos:
@kernel-vanilla/stable
@kernel-vanilla/mainline
(https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories)
You will need to manually kick of a github runner to test with a custom
kernel version. To do that, go to the github actions tab under
'zfs-qemu' and click the drop-down for 'run workflow'. In there you
will see a text box to specify the version (like '6.14'). The scripts
will do their best to match the version to the newest matching version
that the repos support (since they're may be multiple nightly versions
[8 lines not shown]
txg: generalise txg_wait_synced_sig() to txg_wait_synced_flags() (#17284)
txg_wait_synced_sig() is "wait for txg, unless a signal arrives". We
expect that future development will require similar "wait unless X"
behaviour.
This generalises the API as txg_wait_synced_flags(), where the provided
flags describe the events that should cause the call to return.
Instead of a boolean, the return is now an error code, which the caller
can use to know which event caused the call to return.
The existing call to txg_wait_synced_sig() is now
txg_wait_synced_flags(TXG_WAIT_SIGNAL).
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <robn at despairlabs.com>
[3 lines not shown]
zfs-rollback.8: fix typo in example number
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Alexander Ziaee <ziaee at FreeBSD.org>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Signed-off-by: Quentin Thébault <quentin.thebault at defenso.fr>
Closes #17282
(cherry picked from commit 63de2d2dbdd47d4cf3179a1a2b77079741d43ddb)
Improve L2 caching control for prefetched indirects
dbuf_prefetch_impl() should look on level of current indirect, not
the target prefetch level. dbuf_prefetch_indirect_done() should
call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it.
It should fix some both false positive and negative L2ARC caching.
While there, fix redacted feature activation assertions. One was
always true, while another could give false positive if dpa_dnode
is NULL.
George Amanakis <gamanakis at gmail.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #17204
(cherry picked from commit a497c5fc8b14d0d868d2817a241d0429d6e3fca2)
Fix null dereference in spa_vdev_remove_cancel_sync()
We don't really need to access space map to know where the metaslab
ends, while msp->ms_sm might be NULL.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed by: Igor Kozhukhov <ikozhukhov at gmail.com>
Signed-off-by: Alexander Motin <mav at FreeBSD.org>
Sponsored by: iXsystems, Inc.
Fixes #17164
Fixes #17359
Closes #17361
(cherry picked from commit 5c30b24381644a9d1b83d51e813e5e7efba23bc6)
Fix double spares for failed vdev
It's possible for two spares to get attached to a single failed vdev.
This happens when you have a failed disk that is spared, and then you
replace the failed disk with a new disk, but during the resilver
the new disk fails, and ZED kicks in a spare for the failed new
disk. This commit checks for that condition and disallows it.
Reviewed-by: Akash B <akash-b at hpe.com>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes: #16547
Closes: #17231
(cherry picked from commit f40ab9e399280bea6a0fc1c17803d1a6ea524bff)
ZTS: Fix replacement/resilver_restart_001 on FreeBSD
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20.
So the test, which expects two 2 resilver starts will see them.
Logfile of the seen failures before this fix:
log: NOTE: expected 2 resilver start(s) after offline/online, found 1
log: expected 2 resilver start(s) after offline/online, found 1
The test time decreases also from around 00:42 to 00:24 seconds.
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs at mcmilk.de>
Closes #16822
Closes #17279
(cherry picked from commit 3b188772696ed3738340a2aaddd401e7de1f290c)