zfs_vnops_os.c: Move a vput() to after zfs_setattr_dir()
Without this patch, the following crash can occur when
a file system is configured with "xattr=dir".
VNASSERT failed: locked not true at
/posix-acl/freebsd-rdma/sys/kern/vfs_subr.c:5786 (assert_vop_locked)
hold count flags ()
flags ()
lock type zfs: UNLOCKED
panic: zfs_dirent_lookup: vnode is not locked but should be
cpuid = 3
time = 1770520763
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
vpanic() at vpanic+0x136/frame 0xfffffe00914c8270
panic() at panic+0x43/frame 0xfffffe00914c82d0
assert_vop_locked() at assert_vop_locked+0x78
zfs_dirent_lookup() at zfs_dirent_lookup+0x41
[14 lines not shown]
Include missing newline in 'man' error
Because the `strerror` result doesn't include a newline, we need to add
one. Observed on a minimal system that doesn't have `man` installed,
which behaves like this before the fix:
```
[root at upper tim]# zpool help import
couldn't run man program: No such file or directory[root at upper tim]#
```
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tim Hatch <tim at timhatch.com>
Closes #18183
Allow rewrite skip cloned and snapshotted blocks
Rewrite of cloned and snapshotted blocks can allocate additional
space, that may be undesired. In some cases it may have sense
to still rewrite snapshotted blocks, expecting the snapshots to
rotate with time, freeing space. In other cases rewrite of cloned
blocks may be acceptable, despite persistent space usage increase.
For this reason add them as separate flags to `zfs rewrite`.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18179
AUTHORS: add names of recent new contributors
"Welcome to my house! Enter freely. Go safely, and leave something of
the happiness you bring!"
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #18189
ZTS: update the relevant mmp test cases
- mmp_concurrent_import: added test case to verify that concurrent
import correctness. The pool may only be imported once.
- mmp_exported_import: an activity check is now required for pools
which were cleanly exported if the system and pool hostids don't
match.
- mmp_inactive_import: an activity check is now required for any
pool which wasn't cleanly exported, even if the system and pool
hostids match.
- mmp_on_uberblocks: updated expected uberblocks to take in to account
the value MMP_INTERVAL_DEFAULT is set too.
- mmp_reset_interval: reduce the number of iterations from 10 to 3.
This is sufficient to verify functionality and significantly speeds
up the test.
[26 lines not shown]
zhack: add "action idle" subcommand
In order to reliably test the multihost protection we need two (or more)
systems attempting to import the pool at the same time. Historically, we've
used ztest running in userspace to simulate an active pool and attempted to
import the pool with the kernel modules. This works but ztest is a bit
unwieldy for this and if it crashes for unrelated reasons it can result
in false positives.
All we really need is the pool imported in userspace so the MMP thread is
active and writing out uberblocks. We can extend zhack which already knows
how to import the pool read/write and add an option to leave the pool open
and idle.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
zhack: add -G option to dump debug buffer
Add a -G option to zhack to dump the internal debug buffer on exit.
We were able to use the same code from zdb for this which was nice.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
mmp: claim sequence id before final import
As part of SPA_LOAD_IMPORT add an additional activity check to
detect simultaneous imports from different hosts. This check is
only required when the timing is such that there's no activity
for the the read-only tryimport check to detect. This extra
safety chceck operates as follows:
1. Repeats the following MMP check 10 times:
a. Write out an MMP uberblock with the best txg and a random
sequence id to all primary pool vdevs.
b. Verify a minimum number of good writes such that even if
the pool appears degraded on the remote host it will see
at least one of the updated MMP uberblocks.
c. Wait for the MMP interval this leaves a window for other
racing hosts to make similar modifications which can be
detected.
d. Call vdev_uberblock_load() to determine the best uberblock
to use, this should be the MMP uberblock just written.
[35 lines not shown]
mmp: add spa_load_name() for tryimport
Tryimport adds a unique prefix to the pool name to avoid name
collisions. This makes it awkward to log user-friendly info
during a tryimport. Add a spa_load_name() function which can
be used to report the unmodified pool name.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
mmp: move "Starting import" log message
Move the "Starting import" log message in to the import block so
it's matched with the "Fiinshed importing" debug message.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
mmp: further restrict mmp exported pool check
For a cleanly exported pools there exists a small window where
both systems may determine it's safe to import the pool and skip
the activity check. Only allow the check to be skipped when the
last imported hostid matches the systems hostid and the pool was
cleanly exported.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Akash B <akash-b at hpe.com>
Fix activating large_microzap on receive
This ensures that the in-memory state of the feature is recorded and
that `dsl_dataset_activate_feature` is not called when the feature
is already active.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Austin Wise <AustinWise at gmail.com>
Closes #18143
Closes #18144
pyzfs: remove unimplemented libzfs_core functions from pyzfs
As per #9008, pyzfs implements and documents several functions that
would be very useful, but then try to call c functions in libzfs_core.
These functions do not exist in libzfs_core, and in the ~7 years of
ticket creation still do not exist in libzfs_core.
It seems unlikely that these functions will get implemented, though 2
years ago, ~5 years after that ticket lzc_get_props was implemented in
23a489a41167890cdd227366a5f950170df8cc9b which enabled get properties in
pyzfs. Sadly the first thing the pyzfs function for lzc_get_props does
is call _list, which cals lzc_list, which is not implmented. And the
functions to set or inherit properties are still missing.
Having these functions in pyzfs are misleading, footguns, and time
wasters when evaluating pyzfs.
Removing these functions from pyzfs means that _if_ these functions are
added in libzfs_core, then pyzfs will also need to re-implement these
[8 lines not shown]
DDT: Add locking for table ZAP destruction
Similar to BRT, DDT ZAP can be destroyed by sync context when it
becomes empty. Respectively similar to BRT introduce RW-lock to
protect open context methods from the destruction.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18115
spl: lift 64-bit math compat out to separate file
It's a lot of rarely-compiled code, so move it to the side to make other
code easier to read.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #18117
When receiving a stream with the large block flag, activate feature
ZFS send streams include a feature flag DMU_BACKUP_FEATURE_LARGE_BLOCKS
to indicate the presence of large blocks in the dataset. On the sending
side, this flag is included if the `-L` flag is passed to `zfs send`
and the feature is active in the dataset. On the receive side, the
stream is refused if the feature is active in the destination dataset
but the stream does not include the feature flag.
The problem is the feature is only activated when a large block is
born. If a large block has been born in the destination, but never
the source, the send can't work. This can arise when sending streams
back and forth between two datasets.
This commit fixes the problem by always activating the large blocks
feature when receiving a stream with the large block feature flag.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Austin Wise <AustinWise at gmail.com>
Closes #18105
cmd/zfs: clone: accept `-u` to not mount newly created datasets
Signed-off-by: Ivan Shapovalov <intelfx at intelfx.name>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18080
Remove the obsolete FreeBSD 14.2-RELEASE from CI
Sponsored by: ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Alan Somers <asomers at gmail.com>
Closes #18013
CI: Fix qemu-1-setup failure, remove debug stuff
- For whatever reason, the runner will now startup with either two 75GB
disks or one 150GB disk. Previously the runner was always booting
with two 75GB, but about a quarter of the time it now starts up
with a single 150GB disk. This caused qemu-1-setup.sh to fail
since it expected the two 75GB disks. This commit updates
qemu-1-setup.sh to work with either disk config.
- Remove the watchdog from qemu-1-setup.sh. It didn't turn out to be
useful.
- Remove the timestamps that zfs-qemu.yml added to the qemu-1-setup.sh
output. The timestamps were redundant, since you can already
download timestamped logs from the Github web interface.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #18166
spl: remove a _KERNEL check
This code is only compiled for the Linux kernel module, so that define
is always set.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #18117
icp: emit .note.GNU-stack section for all ELF targets
On FreeBSD, linking the zfs kernel module with binutils ld 2.44 shows
the following warning:
ld: warning: aesni-gcm-avx2-vaes.o: missing .note.GNU-stack section
implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a
future version of the linker
Some of the `.S` files under `module/icp/asm-x86_64/modes` check whether
to emit the `.note.GNU-stack` section using:
#if defined(__linux__) && defined(__ELF__)
We could add `&& defined(__FreeBSD__)` to the test, but since all other
`.S` files in the OpenZFS tree use:
#ifdef __ELF__
[6 lines not shown]
Fix zfs_open() to skip zil_async_to_sync() for the snapshot
Fix zfs_open() to skip zil_async_to_sync() for the snapshot, as it won't
have any transactions. zfsvfs->z_log is NULL for the snapshot.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Jitendra Patidar <jitendra.patidar at nutanix.com>
Closes #18091
ZTS: add regression test for #17180
In #17180, we fixed an interesting bug that i believe i hit in one of my
pools, but as far as i can tell, there was no test for it.
this patch adds a regression test for #17180, minimised from my attempts
to reproduce the bug in a way that resembled the history of my pool.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Adam Moss <c at yotes.com>
Signed-off-by: delan azabani <dazabani at igalia.com>
Closes #18109
kmem: don't add __GFP_COMP for KM_VMEM allocations
It hasn't been necessary since Linux 3.13
(torvalds/linux at a57a49887eb33), and since 6.19 the kernel warns if you
use it.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Closes #18053
DDT: Add/use zap_lookup_length_uint64_by_dnode()
Unlike other ZAP consumers due to compression DDT does not know
how big entry it is reading from ZAP. Due to this it called
zap_length_uint64_by_dnode() and zap_lookup_uint64_by_dnode(),
each of which does full ZAP entry lookup.
Introduction of the combined ZAP method dramatically reduces the
CPU overhead and locks contention at DBUF layer.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18048
nvpair: chase FreeBSD xdrproc_t definition
As of FreeBSD 16, xdrproc_t will take exactly two arguments in both
kernel and userspace in line with the Linux kernel.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Alan Somers <asomers at freebsd.org>
Signed-off-by: Brooks Davis <brooks at capabilitieslimited.co.uk>
Closes #18154
FreeBSD: unbreak compilation on i386
tests/zfs-tests/cmd/mmap_seek.c: use correct printf specifier
module/zfs/vdev.c: vdev_clear(): correctly cast argument to
atomic_add_64().
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Martin Matuska <mm at FreeBSD.org>
Closes #18096
Bypass snprintf() in quota checks if no quotas set
This improves synthetic 1 byte write speed by ~2.5%.
Reviewed-by: Ameer Hamza <ahamza at ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18063
DDT: Fix compressed entry buffer size
The first byte of the entry after compression is used for algorithm
and byte order flag. We should decrement when calling compression/
decompression algorithm.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18055
DDT: Move logs searches out of the lock
Postponing entry removal from the DDT log in case of hit till later
single-threaded sync stage allows to make ddl_tree stable during
multi-threaded ZIO processing stage. It allows to drop the DDT lock
before the search instead of after, reducing the contention a lot.
Actually ddt_log_update_entry() was already handling the case of
entry present in the active log, so we only need to remove it from
flushing log, if the entry happen to be there.
My tests with parallel 4KB block writes show throughput increase
from 480MB/s (122K blocks/s) to 827MB/s (212K blocks/s), even
though still limited by the global DDT lock contention.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18044