[MLIR][Python] Remove partial LLVM APIs in python bindings (3/n) (#178984)
This PR continues work from #178290
It cleans up multiple LLVM utilities in *.h files under
`mlir/Bindings/python`, along with the corresponding *.cpp files.
Improve caching for dbuf prefetches
To avoid read errors with transaction open dmu_tx_check_ioerr()
is used to read everything required in advance. But there seems
to be a chance for the buffer to evicted from dbuf cache in
between, which result in immediate eviction from ARC, which may
require additional disk read later in a place where error handling
is problematic.
To partially workaround this introduce a new flag DMU_IS_PREFETCH,
relayed to ARC as ARC_FLAG_PREFETCH | ARC_FLAG_PRESCIENT_PREFETCH,
making ARC delay eviction by at least several seconds, or till the
actual read inside the transaction, that will promote it to demand
access.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18160
[clang] remove addrspace cast from CreateIRTemp (#179327)
This just added unnecessary work to the IR, since they are only used for
load and store, which just causes some IR noise. Tests updated by UTC
script to remove the extra lines.
[RegAlloc] Change the computation of CSRCost (#177226)
This patch fixes https://github.com/llvm/llvm-project/issues/150737.
The original computed CSRCost is too small, so the optimization of
spilling instead of using CSR is rarely triggered.
Also the original cost model is too difficult to be understood and too
hard to be tuned by backend developers and users.
So this patch changes the CSRCost to be
CSRCost = TRI->getCSRFirstUseCost() * EntryFreq * Scale
TRI->getCSRFirstUseCost() is the raw cost of save/restore a CSR. Usually
we don't need to tune this number.
EntryFreq is the BlockFrequency of the entry block.
Scale is used to scale down the CSRCost, because we usually prefer a CSR
register instead of spilling if we have similar CSRCost and spill cost,
[8 lines not shown]
[Evaluator] require invariant size to fully span the global (#179518)
Relying on the semantics of the type here is a bit potentially awkward,
since the full allocated space may be accessible to the user if desired,
since that space is defined to be a part of the type's sizeof
computation (e.g. for a `memcpy(gv, gv, sizeof(*gv))` or when making an
array of them). It also gets in the way of removing getAllocatedType
from AllocaInst entirely (they are converted to GlobalVariable
sometimes). It was originally added in
519561f418c77dcf46fd6d96d25d884fa07fd7da, though "correct size" is a
difficult thing to define.
The frontend (in clang) appears to always emit the full type size here,
so there seems like this shouldn't be visible change to clang users.
This is still a bit awkward though, since a global is defined to be any
size that is bigger than this unless it has a known initizalizer,
rendering the test still incomplete here against the IR semantics.
arc: remove unused l2df_size and l2df_type from l2arc_data_free_t
These fields became unused when ABD was introduced in a6255b7fc.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
cache_012_pos: disable compression to ensure L2ARC wrap
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
ZTS: Add L2ARC DWPD and parallel writes tests
Add four new functional tests to validate L2ARC DWPD rate limiting and
parallel write features:
- l2arc_dwpd_ratelimit_pos: Verifies DWPD rate limiting with different
values (0, 100, 1000, 10000) and ordering
- l2arc_dwpd_reimport_pos: Verifies DWPD rate limiting persists after
pool export/import
- l2arc_multidev_scaling_pos: Verifies parallel write scaling ratio
(dual devices achieve ~2× single device throughput)
- l2arc_multidev_throughput_pos: Verifies absolute parallel write
throughput scales with device count (~32MB/s per device)
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
man: Update L2ARC tunables for DWPD and parallel writes
Add l2arc_dwpd_limit, remove l2arc_write_boost, update related tunables.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
L2ARC: Implement DWPD-based rate limiting with adaptive feed intervals
Add DWPD (Drive Writes Per Day) rate limiting to control L2ARC write
speeds and protect SSD endurance. Write rate is constrained by the
minimum of l2arc_write_max and DWPD-calculated budget. Devices
accumulate unused write budget over 24-hour periods with automatic reset
and carry-over. Writes occur in controlled bursts (max 50MB) with
adaptive intervals to achieve target rates. Applies after initial device
fill.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
L2ARC: Implement per-device feed threads for parallel writes
Transform L2ARC from single global feed thread to per-device threads,
enabling parallel writes to multiple L2ARC devices. Each device runs
its own feed thread independently, improving multi-device throughput.
Previously, a single thread served all devices sequentially; now each
device writes concurrently. Threads are created during device addition
and torn down on removal.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
L2ARC: Preserve L2HDR in arc_release() for in-flight writes
When arc_release() is called on a header with a single buffer and
L2_WRITING set, the L2HDR must be preserved for ABD cleanup (similar
to the arc_hdr_destroy() case). If we destroy the L2HDR here, later
arc_write() will allocate a new ABD and call arc_hdr_free_abd(),
which needs b_l2hdr.b_dev to properly defer ABD cleanup, causing
VERIFY(HDR_HAS_L2HDR(hdr)) to fail.
Allocate a new header for the buffer in the single_buf_l2writing
case (single buffer + L2_WRITING), leaving the original header with
L2HDR intact. The original header becomes an "orphan" (no buffers, no
b_pabd) but retains device association for ABD cleanup when
l2arc_write_done() completes.
The shared buffer case (HDR_SHARED_DATA) is excluded because L2ARC
makes its own transformed copy via l2arc_apply_transforms(), so the
original ABD is not used by the L2 write. The header can be safely
reused without allocating a new one.
[12 lines not shown]
L2ARC: Reorder header destruction for in-flight L2 writes
With multiple L2ARC devices, headers can be destroyed asynchronously
(e.g., during zpool sync) while L2_WRITING is set. The original code
destroyed L2HDR before L1HDR, causing ABDs to lose their device
association (b_l2hdr.b_dev) when arc_hdr_free_abd() is called.
This caused ABDs to be added to the global free-on-write list without
device information. When any L2ARC device completed its write and
attempted to free these orphaned ABDs, it would panic on
ASSERT(!list_link_active(&abd->abd_gang_link)) because the ABD was
still part of another device's vdev_queue I/O aggregation gang.
Fix by extending l2ad_mtx lock scope to cover L1HDR destruction and
reordering to destroy L1HDR before L2HDR when L2_WRITING is set. This
ensures arc_hdr_free_abd() can access b_l2hdr.b_dev to properly tag
ABDs with their device for deferred cleanup.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
[3 lines not shown]
L2ARC: Implement persistent markers with consistent tail scanning
This commit introduces per-sublist persistent markers that eliminate
redundant tail scanning between L2ARC iterations, providing significant
CPU efficiency improvements. Markers are pre-allocated during device
initialization and properly cleaned up during device removal.
The implementation uses conditional behavior based on device capacity:
small devices (capacity < arc_c) retain original HEAD/TAIL scanning
based on ARC warmup state, while large devices (capacity >= arc_c)
use the persistent marker approach for optimal CPU efficiency.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
L2ARC: Implement even-depth multi-sublist scanning
The introduction of ARC multilists made L2ARC writing quite random,
depending on whether it found something to write in a randomly selected
sublist. This created inconsistent write patterns and poor utilization
of available sublists leading to uneven cache population.
This commit replaces random selection with systematic scanning across
all sublists within each burst. Fair headroom distribution ensures
even-depth traversal across all sublists until the target write size
is reached. Round-robin processing with random starting points eliminates
sequential bias while maintaining predictable write behavior.
The systematic approach provides consistent L2ARC filling patterns
and better utilization of available ARC data across all sublists.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
[RISCV] Add tied destination constraint to CustomSiFiveVMACC. (#179567)
As the name suggess, these are multiply-accumulate instructions and
thus they have 3 sources.
[Clang][WebAssembly] Fix WASM tables to allow `__funcref` function pointers (#178720)
Allows __funcref pointers to be used as the element type for WASM tables
in Clang (static, global, zero-length arrays of a reference type).
Modifies `QualType::isWebAssemblyFuncrefType` to correctly look at the
addrspace of the pointee, rather than the pointer type.
Related: #140933
[mlir][shard,mpi] Fixing lowering allgather shard->mpi->llvm (#178870)
`shard.allgather` concatenates along a specified gather-axis. However,
`mpi.allgather` always concatenates along the first dimension and there
is no MPI operation that allows gathering along an arbitrary axis.
Hence, if gather-axis!=0, we need to create a temporary buffer where we
gather along the first dimension and then copy from that buffer to the
final output along the specified gather-axis. This is not ideal by far.
Along the way also
- fixing computation of memref size in mpitollvm
- adding a simple canonicalization pattern for comm_size for easier
debugging
- adding more tests
[CodeGen] Remove unused first operand of SUBREG_TO_REG (#179690)
The first input operand of SUBREG_TO_REG was an immediate that most
targets set to 0. In practice it had no effect on codegen. Remove it.