ZFS on Linux/src c44a3ecmodule/zfs zvol.c, tests/runfiles linux.run

zvol: allow rename of in use ZVOL dataset

While ZFS allow renaming of in use ZVOLs at the DSL level without issues
the ZVOL layer does not correctly update the renamed dataset if the
device node is open (zv->zv_open_count > 0): trying to access the stale
dataset name, for instance during a zfs receive, will cause the
following failure:

VERIFY3(zv->zv_objset->os_dsl_dataset->ds_owner == zv) failed ((null) == ffff8800dbb6fc00)

PANIC at zvol.c:1255:zvol_resume()
Showing stack for process 1390
CPU: 0 PID: 1390 Comm: zfs Tainted: P           O  3.16.0-4-amd64 #1 Debian 3.16.51-3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 0000000000000000 ffffffff8151ea00 ffffffffa0758a80 ffff88028aefba30
 ffffffffa0417219 ffff880037179220 ffffffff00000030 ffff88028aefba40
 ffff88028aefb9e0 2833594649524556 6f5f767a3e2d767a 6f3e2d7465736a62
Call Trace:
 [<0>] ? dump_stack+0x5d/0x78
 [<0>] ? spl_panic+0xc9/0x110 [spl]
 [<0>] ? mutex_lock+0xe/0x2a
 [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs]
 [<0>] ? rrw_exit+0xc8/0x2e0 [zfs]
 [<0>] ? mutex_lock+0xe/0x2a
 [<0>] ? dmu_objset_from_ds+0x9a/0x250 [zfs]

    [20 lines not shown]

ZFS on Linux/src 0c637f3module/zfs vdev_disk.c vdev.c

zpool reports 16E expandsize on disks with oddball number of sectors

The issue is caused by a small discrepancy in how userland creates the
partition layout and the kernel estimates available space:

 * zpool command: subtract 9M from the usable device size, then align
   to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M
   EFI "reserved" partition.

 * kernel module: subtract 10M from the device size. 10M is the sum of
   1M "start" partition alignment + 1m "end" partition alignment + 8M
   EFI "reserved" partition.

For devices where the number of sectors is not a multiple of the
alignment size the zpool command will create a partition layout which
reserves less than 1M after the 8M EFI "reserved" partition:

  Disk /dev/sda: 1024 MiB, 1073739776 bytes, 2097148 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  Disklabel type: gpt
  Disk identifier: 49811D40-16F4-4E41-84A9-387703950D7F

  Device       Start     End Sectors  Size Type

    [16 lines not shown]

ZFS on Linux/src 8d9e51cinclude/sys zfs_context.h, module/zfs dnode.c

Fix dnode_hold_impl() soft lockup

Soft lockups could happen when multiple threads trying
to get zrl on the same dnode handle in order to allocate
and initialize the dnode marked as DN_SLOT_ALLOCATED.

Don't loop from beginning when we can't get zrl, otherwise
we would increase the zrl refcount and nobody can actually
lock it.

Reviewed by: Tom Caputi <tcaputi at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Li Dongyang <dongyangli at ddn.com>
Closes #8433 

ZFS on Linux/src f8bb2a7man/man8 zpool.8

Clarify zpool iostat statistics reporting

Document expected behavior for  zpool iostat statistics reporting.

Reviewed by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Allan Jude <allanjude at freebsd.org>
Signed-off-by: Kash Pande <kash at tripleback.net>
Closes #2888 
Closes #8417 
DeltaFile
+9-4man/man8/zpool.8
+9-41 files

ZFS on Linux/src f23b024man/man8 zpool.8

Fix '-T u|d' descriptions in zpool(8)

In

        -T u|d  Display a time stamp.  Specify -u for a printed
               representation of the internal representation of time.
               See time(2).  Specify -d for standard date format.
               See date(1).

'Specify u' and 'Specify d' should be used instead. `zpool list -T -u`
does not work.

Bring the descriptions in `zpool list` and `zpool status` in sync with
`zpool iostat`.

Reviewed by: Allan Jude <allanjude at freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Anatoly Borodin <anatoly.borodin at gmail.com>
Closes #8438 
DeltaFile
+4-4man/man8/zpool.8
+4-41 files

ZFS on Linux/src 9abbee4module/zfs zvol.c

Don't enter zvol's rangelock for read bio with size 0

The SCST driver (SCSI target driver implementation) and possibly
others may issue read bio's with a length of zero bytes. Although
this is unusual, such bio's issued under certain condition can cause
kernel oops, due to how rangelock is implemented.

rangelock_add_reader() is not made to handle overlap of two (or more)
ranges from read bio's with the same offset when one of them has size
of 0, even though they conceptually overlap. Allowing them to enter
rangelock results in kernel oops by dereferencing invalid pointer,
or assertion failure on AVL tree manipulation with debug enabled
kernel module.

For example, this happens when read bio whose (offset, size) is
(0, 0) enters rangelock followed by another read bio with (0, 4096)
when (0, 0) rangelock is still locked, when there are no pending
write bio's. It can also happen with reverse order, which is (0, N)
followed by (0, 0) when (0, N) is still locked. More details
mentioned in #8379.

Kernel Oops on ->make_request_fn() of ZFS volume
https://github.com/zfsonlinux/zfs/issues/8379

Prevent this by returning bio with size 0 as success without entering

    [8 lines not shown]
DeltaFile
+10-0module/zfs/zvol.c
+10-01 files

ZFS on Linux/src 1e427f2rpm/generic zfs-dkms.spec.in

Add diffutils dependency for dkms build

The cmp and diff utilities are required at configure time.  Add
a dependency on diffutils to ensure they are installed.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Kash Pande <kash at tripleback.net>
Closes #5205 
Closes #8428 

ZFS on Linux/src 928e8adinclude/sys metaslab_impl.h metaslab.h, module/zfs metaslab.c vdev.c

Introduce auxiliary metaslab histograms

This patch introduces 3 new histograms per metaslab. These
histograms track segments that have made it to the metaslab's
space map histogram (and are part of the spacemap) but have
not yet reached the ms_allocatable tree on loaded metaslab's
because these metaslab's are currently syncing and haven't
gone through metaslab_sync_done() yet.

The histograms help when we decide whether to load an unloaded
metaslab in-order to allocate from it. When calculating the
weight of an unloaded metaslab traditionally, we look at the
highest bucket of its spacemap's histogram.  The problem is
that we are not guaranteed to be able to allocated that
segment when we load the metaslab because it may still be at
the freeing, freed, or defer trees. The new histograms are
used when we try to calculate an unloaded metaslab's weight
to deal with this issue by removing segments that have would
not be in the allocatable tree at runtime. Note, that this
method of dealing with this is not completely accurate as
adjacent segments are not always consolidated in the space
map histogram of a metaslab.

In addition and to make things deterministic, we always reset
the weight of unloaded metaslabs based on their space map

    [12 lines not shown]

ZFS on Linux/src bb1be77module/zfs dmu_objset.c, tests/runfiles linux.run

Prevent user accounting on readonly pool

Trying to mount a dataset from a readonly pool could inadvertently start
the user accounting upgrade task, leading to the following failure:

VERIFY3(tx->tx_threads == 2) failed (0 == 2)
PANIC at txg.c:680:txg_wait_synced()
Showing stack for process 2541
CPU: 2 PID: 2541 Comm: z_upgrade Tainted: P           O  3.16.0-4-amd64 #1 Debian 
3.16.51-3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 [<0>] ? dump_stack+0x5d/0x78
 [<0>] ? spl_panic+0xc9/0x110 [spl]
 [<0>] ? dnode_next_offset+0x1d4/0x2c0 [zfs]
 [<0>] ? dmu_object_next+0x77/0x130 [zfs]
 [<0>] ? dnode_rele_and_unlock+0x4d/0x120 [zfs]
 [<0>] ? txg_wait_synced+0x91/0x220 [zfs]
 [<0>] ? dmu_objset_id_quota_upgrade_cb+0x10f/0x140 [zfs]
 [<0>] ? dmu_objset_upgrade_task_cb+0xe3/0x170 [zfs]
 [<0>] ? taskq_thread+0x2cc/0x5d0 [spl]
 [<0>] ? wake_up_state+0x10/0x10
 [<0>] ? taskq_thread_should_stop.part.3+0x70/0x70 [spl]
 [<0>] ? kthread+0xbd/0xe0
 [<0>] ? kthread_create_on_node+0x180/0x180

    [10 lines not shown]

ZFS on Linux/src 75d6b7dtests/zfs-tests/tests/functional/features/large_dnode setup.ksh large_dnode_007_neg.ksh

Add missing copyright notice to large_dnode tests

Missing copyright notices were noticed during the Illumos
RTI process. Add LLNS 2016 copyright based on original merge
date.

Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ned Bass <bass6 at llnl.gov>
Closes #8435 

ZFS on Linux/src 790c880lib/libzpool util.c

Fix zdb crash

We have to use umem_free() instead of free() if we are using
umem_zalloc()

Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Igor Kozhukhov <igor at dilos.org>
Closes #8402 

ZFS on Linux/src 435637dtests/zfs-tests/tests/functional/cli_root/zfs_set user_property_002_pos.ksh

ZTS: user_property_002_pos fails to destroy volume

During the cleanup function of this test, an attempt to destroy a volume
can fail because the volume is busy. This leaves the system with
unexpected datasets which in turn causes subsequent failures.

Reviewed-by: bunder2015 <omfgbunder at gmail.com>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: John Kennedy <john.kennedy at delphix.com>
Closes #8422 

ZFS on Linux/src 11f6127man/man8 zfs.8

zfs mount man page should document legacy behaviour

Document legacy mount behavior.

Reviewed by: Allan Jude <allanjude at freebsd.org>
Reviewed by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: bunder2015 <omfgbunder at gmail.com>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Signed-off-by: Kash Pande <kash at tripleback.net>
Closes #2900
Closes #8414 
DeltaFile
+11-3man/man8/zfs.8
+11-31 files

ZFS on Linux/src f545b6amodule/zfs zio.c

Delay injection can cause indefinitely hung zios

If we hit the (NSEC_TO_TICK(diff) == 0) condition in
zio_delay_interrupt, zio_interrupt is never called and the
zio does not progress.

Reviewed by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: sara hartse <sara.hartse at delphix.com>
Closes #8404 
DeltaFile
+1-0module/zfs/zio.c
+1-01 files

ZFS on Linux/src a28c1a5etc/systemd/system zfs-share.service.in

ZFS mounted NFSv3 shares fail lock reclaims

ZFS NFS shares mounted on a client with NFSv3 and with open 
locks will fail to reclaim those locks after a server reboot. 

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Wilson <george.wilson at delphix.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Don Brady <don.brady at delphix.com>
Closes #8398 

ZFS on Linux/src 07237a7tests/zfs-tests/tests/functional/snapshot clone_001_pos.ksh

ZTS: clone_001_pos fails in cleanup on busy dataset

The "cleanup_all" function in this test calls "zfs destroy" which
fails approximately 30% of the time in our environment due to the
dataset being busy. Since the failure happens during cleanup, the
error is propagated to subsequent tests.

Tested by running the snapshot test group in a loop without seeing
any failures.

Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: John Kennedy <john.kennedy at delphix.com>
Closes #8409 

ZFS on Linux/src 638dd5fman/man5 zfs-module-parameters.5, module/zfs zio.c

zio_deadman_impl() fix and enhancement

Add the zio_deadman_log_all tunable to print all zios in
zio_deadman_impl().  Also, in all cases, display the depth of the
zio relative to the original parent zio.  This is meant to be used by
developers to gain diagnostic information for hangs which don't involve
fully set-up zio trees or are otherwise stuck or hung in an early stage.

Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Signed-off-by: Tim Chase <tim at chase2k.com>
Closes #8362 

ZFS on Linux/src 9c5e88bcmd/zfs zfs_main.c, include libzfs.h

zfs should optionally send holds

Add -h switch to zfs send command to send dataset holds. If
holds are present in the stream, zfs receive will create them
on the target dataset, unless the zfs receive -h option is used
to skip receive of holds.

Reviewed-by: Alek Pinchuk <apinchuk at datto.com>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Paul Dagnelie <pcd at delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski at datto.com>
Closes #7513 

ZFS on Linux/src e73ab1binclude/spl/sys rwlock.h

Linux 4.20 compat: Fix VERIFY(RW_READ_HELD(&hash->mh_contents))

The 4.20 kernel changed the meaning of the rw_semaphore.owner bits,
causing an assertion when loading the module under the 4.20 kernel.
This patch fixes the issue.

Reviewed-by: Chunwei Chen <tuxoko at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8360 
Closes #8389 
DeltaFile
+45-10include/spl/sys/rwlock.h
+45-101 files

ZFS on Linux/src 2d76ab9module/zfs zvol.c

Fix obsolete comment on rangelock

5d43cc9a59 renamed it to rangelock_enter().

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro at osnexus.com>
Closes #8408 
DeltaFile
+1-1module/zfs/zvol.c
+1-11 files

ZFS on Linux/src cf89a4ecmd/zdb zdb.c

zdb: replace label_t to zdb_label_t for reduce collisions

with builds on illumos based platform we can see build issue
because label_t has been redefined.

for reduce build issues on others platforms we should rename
label_t to zdb_label_t.

Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Igor Kozhukhov <igor at dilos.org>
Closes #8397 
DeltaFile
+8-8cmd/zdb/zdb.c
+8-81 files

ZFS on Linux/src 65282eeman/man5 zfs-module-parameters.5, module/zfs dmu.c

Freeing throttle should account for holes

Deletion throttle currently does not account for holes in a file.
This means that it can activate when it shouldn't.
To fix it we switch the throttle to be based on the number of
L1 blocks we will have to dirty when freeing

Reviewed by: Tom Caputi <tcaputi at datto.com>
Reviewed by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk at datto.com>
Closes #7725 
Closes #7888 

ZFS on Linux/src dcec0a1include/sys dataset_kstats.h, man/man5 zfs-module-parameters.5

port async unlinked drain from illumos-nexenta

This patch is an async implementation of the existing sync
zfs_unlinked_drain() function. This function is called at mount time and
is responsible for freeing znodes that we didn't get to freeing before.
We don't have to hold mounting of the dataset until the unlinked list is
fully drained as is done now. Since we can process the unlinked set
asynchronously this results in a better user experience when mounting a
dataset with entries in the unlinked set.

Reviewed by: Jorgen Lundman <lundman at lundman.net>
Reviewed by: Tom Caputi <tcaputi at datto.com>
Reviewed by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed by: Paul Dagnelie <pcd at delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk at datto.com>
Closes #8142 

ZFS on Linux/src 425d323include/sys metaslab_impl.h space_map.h, module/zfs metaslab.c space_map.c

Get rid of space_map_update() for ms_synced_length

Initially, metaslabs and space maps used to be the same thing
in ZFS. Later, we started differentiating them by referring
to the space map as the on-disk state of the metaslab, making
the metaslab a higher-level concept that is metadata that deals
with space accounting. Today we've managed to split that code
furthermore, with the space map being its own on-disk data
structure used in areas of ZFS besides metaslabs (e.g. the
vdev-wide space maps used for zpool checkpoint or vdev removal
features).

This patch refactors the space map code to further split the
space map code from the metaslab code. It does so by getting
rid of the idea that the space map can have a different in-core
and on-disk length (sm_length vs smp_length) which is something
that is only used for the metaslab code, and other consumers
of space maps just have to deal with. Instead, this patch
introduces changes that move the old in-core length of the
metaslab's space map to the metaslab structure itself (see
ms_synced_length field) while making the space map code only
care about the actual space map's length on-disk.

The result of this is that space map consumers no longer have
to deal with syncing two different lengths for the same

    [16 lines not shown]

ZFS on Linux/src d8d418fcontrib/pyzfs/libzfs_core/test test_libzfs_core.py, lib/libzfs libzfs_sendrecv.c

ZVOLs should not be allowed to have children

zfs create, receive and rename can bypass this hierarchy rule. Update
both userland and kernel module to prevent this issue and use pyzfs
unit tests to exercise the ioctls directly.

Note: this commit slightly changes zfs_ioc_create() ABI. This allow to
differentiate a generic error (EINVAL) from the specific case where we
tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT).

Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tom Caputi <tcaputi at datto.com>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>

ZFS on Linux/src 4417096man/man8 zfs.8, module/zfs spa_misc.c

Pool allocation classes misplacing small file blocks

Due to an off-by-one condition in spa_preferred_class() we are picking
the "normal" allocation class instead of the "special" one for file
blocks with size equal to the special_small_blocks property value.

This change fix the small code issue, update the ZFS Test Suite and the
zfs(8) man page.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Don Brady <don.brady at delphix.com>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8351
Closes #8361

ZFS on Linux/src 0902c45module/zfs arc.c

Fix ARC stats for embedded blkptrs

Re-factor arc_read() to better account for embedded data blkptrs.
Previously, reading the payload from an embedded blkptr would cause
arcstats such as demand_metadata_misses to be bumped when there was
actually no cache "miss" because the data are already available in
the blkptr.

The following test procedure was used to demonstrate the problem:

   zpool create tank ...
   zfs create -o compression=lz4 tank/fs
   echo blah > /tank/fs/blah
   stat /tank/fs/blah
   grep 'meta.*mis' /proc/spl/kstat/zfs/arcstats

and repeating the last two steps to watch the metadata miss counter
increment.  This can also be demonstrated via the  zfs_arc_miss DTRACE4
probe in arc_read().

Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed-by: George Wilson <george.wilson at delphix.com>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>

    [2 lines not shown]
DeltaFile
+32-15module/zfs/arc.c
+32-151 files

ZFS on Linux/src 9634299scripts zfs-tests.sh, tests/zfs-tests/include commands.cfg

OpenZFS 9185 - Enable testing over NFS in ZFS performance tests

This change makes additions to the ZFS test suite that allows the
performance tests to run over NFS. The test is run and performance data
collected from the server side, while IO is generated on the NFS client.

This has been tested with Linux and illumos NFS clients.

Authored by: Ahmed Ghanem <ahmedg at delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel at delphix.com>
Reviewed by: John Kennedy <john.kennedy at delphix.com>
Reviewed by: Kevin Greene <kevin.greene at delphix.com>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Ported-by: John Kennedy <john.kennedy at delphix.com>
Signed-off-by: John Kennedy <john.kennedy at delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/9185
Closes #8367

ZFS on Linux/src 1a745efman/man8 zstreamdump.8

zstreamdump: -d option is not documented in manpage

This change simply documents the missing -d (dump contents) option in
zstreamdump(8).

Reviewed-by: bunder2015 <omfgbunder at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8369 

ZFS on Linux/src bf6ca0acmd/zed/zed.d zed-functions.sh statechange-led.sh, cmd/zpool/zpool.d smart

shellcheck pass

note: which is non-standard. Use builtin 'command -v' instead. [SC2230]
note: Use -n instead of ! -z. [SC2236]

Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: bunder2015 <omfgbunder at gmail.com>
Closes #8367 

ZFS on Linux/src cca1412tests/test-runner/bin test-runner.py zts-report.py

flake8 pass

F632 use ==/!= to compare str, bytes, and int literals

Reviewed-by: Håkan Johansson <f96hajo at chalmers.se>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: bunder2015 <omfgbunder at gmail.com>
Closes #8368 

ZFS on Linux/src 57dc41dcmd/zpool zpool_main.c

Fix zpool iostat -w header names

The zpool iostat latency histograms (-w) has column names
'sync_queue' and 'async_queue', which do not match the man page, nor
the equivalent columns in average latency.  Change the column
names to be 'syncq_wait' and 'asyncq_wait' to be consistent.

Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8338 

ZFS on Linux/src 6c926f4module/zfs vdev.c vdev_removal.c

Simplify log vdev removal code

Get rid of the majority metaslab metadata when removing log vdevs
in spa_vdev_remove_log() with a call to metaslab_fini() instead
of duplicating a lot of that in vdev_remove_empty_log().

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Closes #8347 

ZFS on Linux/src 7558997module/zfs arc.c vdev.c

vs_alloc can underflow in L2ARC vdevs

The current L2 ARC device code consistently uses psize to
increment vs_alloc but varies between psize and lsize when
decrementing it. The result of this behavior is that
vs_alloc can be decremented more that it is incremented
and underflow. This patch changes the code so asize is
used anywhere.

In addition, it ensures that vs_alloc gets incremented by
the L2 ARC device code as buffers are written and not at
the end of the l2arc_write_buffers() routine. The latter
(and old) way would temporarily underflow vs_alloc as
buffers that were just written, would be destroyed while
l2arc_write_buffers() was still looping.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Closes #8298 

ZFS on Linux/src 2747f59module/zfs zthr.c

Don't acquire zthr_request_lock in zthr_wakeup

Address a deadlock caused by simultaneous wakeup and cancel on a zthr
by remove the hold of zthr_request_lock from zthr_wakeup. This
allows thr_wakeup to not block a thread that is in the process of
being cancelled.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Signed-off-by: Sara Hartse <sara.hartse at delphix.com>
Closes #8333 
DeltaFile
+36-18module/zfs/zthr.c
+36-181 files

ZFS on Linux/src 21e7cf5cmd/zdb zdb.c, man/man8 zdb.8

zdb -L should skip leak detection altogether

Currently the point of -L option in zdb is to  disable leak
tracing and the loading of space maps because they are expensive,
yet still do leak detection in terms of space. Unfortunately,
there is a scenario where this is a lie. If we are using zdb -L
on a pool where a vdev is being removed, zdb_claim_removing()
will open the metaslab space maps of that device.

This patch makes it so zdb -L skips leak detection altogether
and ensures that no space maps are loaded.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Closes #8335 
DeltaFile
+134-121cmd/zdb/zdb.c
+2-2man/man8/zdb.8
+136-1232 files

ZFS on Linux/src 26a8565config kernel-bio_set_dev.m4, module/zfs vdev_disk.c

Linux 5.0 compat: Fix bio_set_dev()

The Linux 5.0 kernel updated the bio_set_dev() macro so it calls the
GPL-only bio_associate_blkg() symbol thus inadvertently converting
the entire macro.  Provide a minimal version which always assigns the
request queue's root_blkg to the bio.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8287

ZFS on Linux/src 466f553rpm/generic zfs.spec.in

Exclude test-runner.py from the rpmbuild shebang check

Exclude test-runner.py from the rpmbuild shebang check to allow it to
run under Python 2 and 3.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8331

ZFS on Linux/src ed158b1module Makefile.in

Linux 5.0 compat: Fix SUBDIRs

SUBDIRs has been deprecated for a long time, and was finally removed in
the 5.0 kernel.  Use "M=" instead.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8257
DeltaFile
+3-3module/Makefile.in
+3-31 files

ZFS on Linux/src 0c59329config kernel-fpu.m4, include/linux simd_x86.h

Linux 5.0 compat: Disable vector instructions on 5.0+ kernels

The 5.0 kernel no longer exports the functions we need to do vector
(SSE/SSE2/SSE3/AVX...) instructions.  Disable vector-based checksum
algorithms when building against those kernels.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8259

ZFS on Linux/src caacc6ecmd/ztest ztest.c

GCC 9.0: Fix ztest "directive argument is not a nul-terminated string"

GCC 9.0 is complaining because we're trying to print strings that
are defined like this:

.zo_pool = { 'z', 't', 'e', 's', 't', '\0' },

Fix them by making them actual strings.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8330
DeltaFile
+2-2cmd/ztest/ztest.c
+2-21 files

ZFS on Linux/src 0580549module/zfs zfs_vfsops.c vdev_disk.c

Linux 5.0 compat: Convert MS_* macros to SB_*

In the 5.0 kernel, only the mount namespace code should use the MS_*
macos. Filesystems should use the SB_* ones.

https://patchwork.kernel.org/patch/10552493/

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8264

ZFS on Linux/src 031cea1config kernel-totalram-pages-func.m4 kernel.m4, include/spl/sys vmsystm.h

Linux 5.0 compat: Use totalram_pages()

totalram_pages() was converted to an atomic variable in 5.0:

https://patchwork.kernel.org/patch/10652795/

Its value should now be read though the totalram_pages() helper
function.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8263

ZFS on Linux/src 77e50c3config kernel-access-ok-type.m4 kernel.m4, include/linux kmap_compat.h

Linux 5.0 compat: access_ok() drops 'type' parameter

access_ok no longer needs a 'type' parameter in the 5.0 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8261

ZFS on Linux/src 5cb46f6config kernel-ktime_get_coarse_real_ts64.m4 kernel.m4, include/spl/sys time.h

Linux 4.18 compat: Use ktime_get_coarse_real_ts64()

Newer kernels remove current_kernel_time64().  Use
ktime_get_coarse_real_ts64() in its place.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8258

ZFS on Linux/src c853f38man/man5 zfs-module-parameters.5, module/zfs vdev.c

Change target size of metaslabs from 256GB to 16GB

= Old behavior

For vdev sizes 100GB to 50TB we keep ~200 metaslabs per
vdev and the metaslab size grows from 512MB to 256GB.
For vdev's bigger than that we start increasing the
number of metaslabs until we hit the 128K limit.

= New Behavior

For vdev sizes 100GB to 3TB we keep ~200 metaslabs per
vdev and the metaslab size grows from 512MB to 16GB.
For vdev's bigger than that we start increasing the
number of metaslabs until we hit the 128K limit.

= Reasoning

The old behavior makes metaslabs grow in size when
the vdev range is between 3TB (ms_size 16GB) and
32PB (ms_size 256GB). Even though keeping the number
of metaslabs is good in terms of potential number of
I/Os per TXG, these bigger metaslabs take longer
to be loaded and after they are loaded they can
take up a lot of memory because of their range trees.

    [9 lines not shown]

ZFS on Linux/src df72b8bcmd/zdb zdb.c, include/sys range_tree.h

Rename range_tree_verify to range_tree_verify_not_present

The range_tree_verify function looks for a segment in a
range tree and panics if the segment is present on the
tree. This patch gives the function a more descriptive
name.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Closes #8327 

ZFS on Linux/src 107dd2bmodule/zfs mmp.c

Use proper tag for spa config refcounts in mmp_write_uberblock()

This allows the spa config refcounts to use tracking in debug builds
without triggering the "No such hold %p on refcount" panic.

Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tim Chase <tim at chase2k.com>
Closes #8326 
DeltaFile
+1-1module/zfs/mmp.c
+1-11 files

ZFS on Linux/src 7646af2cmd/zfs zfs_main.c

zfs userspace dumps core when used on ZVOLs

If you try to get the userspace, groupspace or projectspace on a ZVOL,
the generated error results in passing EINVAL to
zfs_standard_error_fmt() when we should return a specific error to
inform the user that those properties aren't available on volumes.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Tom Caputi <tcaputi at datto.com>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8279 
DeltaFile
+12-2cmd/zfs/zfs_main.c
+12-21 files

ZFS on Linux/src 8fccfa8cmd/zpool zpool_main.c, man/man8 zpool.8

zpool iostat should print headers when terminal fills

When `zpool iostat` fills the terminal the headers should be
printed again.  `zpool iostat -n` can be used to suppress this.

If the command is not attached to a tty, headers will not be
printed so as to not break existing scripts.

Reviewed-by: Joshua M. Clulow <josh at sysmgr.org>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Damian Wojsław <damian at wojslaw.pl>
Closes #8235
Closes #8262