MFC r320683: Add naive benchmark for SSDs in ZFS SLOG role.
ZFS SLOGs have very specific access pattern with many cache flushes,
which none of benchmarks I know can simulate. Since SSD vendors rarely
specify cache flush time, this measurement can be useful to explain why
some ZFS pools are slower then expected. This test writes data chunks
of different size followed by cache flush, alike to what ZFS SLOG does,
and measures average time.
To illustrate, here is result for 6 years old SATA Intel 710 Series SSD:
Synchronous random writes:
0.5 kbytes: 138.3 usec/IO = 3.5 Mbytes/s
1 kbytes: 137.7 usec/IO = 7.1 Mbytes/s
2 kbytes: 151.1 usec/IO = 12.9 Mbytes/s
4 kbytes: 158.2 usec/IO = 24.7 Mbytes/s
8 kbytes: 175.6 usec/IO = 44.5 Mbytes/s
16 kbytes: 210.1 usec/IO = 74.4 Mbytes/s
32 kbytes: 274.2 usec/IO = 114.0 Mbytes/s
64 kbytes: 416.5 usec/IO = 150.1 Mbytes/s
128 kbytes: 776.6 usec/IO = 161.0 Mbytes/s
256 kbytes: 1503.1 usec/IO = 166.3 Mbytes/s
512 kbytes: 2968.7 usec/IO = 168.4 Mbytes/s
1024 kbytes: 5866.8 usec/IO = 170.5 Mbytes/s
2048 kbytes: 11696.6 usec/IO = 171.0 Mbytes/s
4096 kbytes: 23329.6 usec/IO = 171.5 Mbytes/s
8192 kbytes: 46779.5 usec/IO = 171.0 Mbytes/s
, and much newer and supposedly much faster NVMe Samsung 950 PRO SSD:
Synchronous random writes:
0.5 kbytes: 2092.9 usec/IO = 0.2 Mbytes/s
1 kbytes: 2013.1 usec/IO = 0.5 Mbytes/s
2 kbytes: 2014.8 usec/IO = 1.0 Mbytes/s
4 kbytes: 2090.7 usec/IO = 1.9 Mbytes/s
8 kbytes: 2044.5 usec/IO = 3.8 Mbytes/s
16 kbytes: 2084.8 usec/IO = 7.5 Mbytes/s
32 kbytes: 2137.1 usec/IO = 14.6 Mbytes/s
64 kbytes: 2173.4 usec/IO = 28.8 Mbytes/s
128 kbytes: 2923.9 usec/IO = 42.8 Mbytes/s
256 kbytes: 3085.3 usec/IO = 81.0 Mbytes/s
512 kbytes: 3112.2 usec/IO = 160.7 Mbytes/s
1024 kbytes: 2430.6 usec/IO = 411.4 Mbytes/s
2048 kbytes: 3788.9 usec/IO = 527.9 Mbytes/s
4096 kbytes: 6198.0 usec/IO = 645.4 Mbytes/s
8192 kbytes: 10764.9 usec/IO = 743.2 Mbytes/s
While the first one obviously has maximal throughput limitations, the
second one has so high cache flush latency (about 2 millisecond), that
it makes one almost useless in SLOG role, despite of its good throughput
numbers. Power loss protection is out of scope of this test, but I
suspect it can be related.
UnifiedSplitRaw