databases/clickhouse-devel: New port
ClickHouse is an open-source column-oriented database management
system that allows generating analytical data reports in real time.
This is based on the the stable series.
WWW: https://clickhouse.com/
aq(4): take F/W statistics off the iflib core lock (kick-and-read)
The once-per-second statistics refresh ran the whole F/W-mailbox
transaction under iflib's CTX (sx) lock: fw2x_get_stats toggled the MPI
STATISTICS control bit and busy-polled the state register for the
acknowledgement (up to ~25 ms) before downloading the counters, so a slow
F/W response blocked datapath reconfigure / ioctls for the duration.
The per-cast and error counters have no direct-register source -- the
reference Linux atlantic driver and our port both read them out of the
F/W mailbox, and the MSM registers the chip exposes are never used for the
periodic counters. So rather than poll, adopt the kick-and-read shape the
iflib peer with the same constraint uses (vmxnet3): consume the snapshot
the F/W produced for the *previous* request, then toggle the bit to
request the next one -- no wait. The F/W finished that previous refresh
~1 s ago, so the download needs no poll, and the toggle write stays
serialized against set_mode by the CTX lock exactly as before. This
removes the 25 ms poll (and the toggle_mpi_ctrl_and_wait_ helper) from
under the lock; only the fast 16-dword download remains.
[10 lines not shown]
aq(4): modernize and de-Linuxify the vendor driver
Dead-code removal, device_printf(9) logging, style(9) de-Linuxification,
const F/W-ops tables, and readability cleanups. No change for valid
traffic.
Dead code and logging:
- Remove the sub-gigabit TSO-masking block in the link-state ISR: it
cleared IFCAP_TSO from the static isc_capabilities record (read only at
attach / SIOCSIFCAP, never on the datapath), so it never gated TSO and
only corrupted the validation mask. The Atlantic has no sub-gigabit TSO
erratum.
- Tidy the RX buffer-size handling: drop the dead switch(MCLBYTES) in
aq_if_rx_queues_alloc, rename rx_max_frame_size -> rx_buf_size, and bound
the per-fragment length from the wb.pkt_len writeback (EBADMSG on
underflow or a final fragment longer than the RX buffer).
- Drop every __FreeBSD__/__FreeBSD_version branch (FreeBSD 14.0 baseline);
the pre-13 arms used pre-opaque-if_t APIs since removed and one never
built.
[31 lines not shown]
aq(4): add a runtime dev.aq.N.debug trace control
The trace_* family (trace/trace_error/trace_warn/trace_detail, used in the
F/W and init/config paths) was gated behind the compile-time
AQ_CFG_DEBUG_LVL, which is 0, so the dbg_level_/dbg_categories_ runtime
variables were dead and tracing could only be enabled by recompiling.
Decouple trace_base_ from AQ_CFG_DEBUG_LVL so it is always compiled and
gated purely at runtime on dbg_level_/dbg_categories_, make those two
variables writable (no longer const, default level 0 = off), and expose
them as dev.aq.N.debug (verbosity) and dev.aq.N.debug_categories
(subsystem mask) sysctls.
The datapath-heavy AQ_DBG_ENTER/PRINT/DUMP macros and the trace_aq_*_descr
descriptor dumps stay behind AQ_CFG_DEBUG_LVL (still 0), so the per-packet
paths are untouched -- trace_* is only used off the datapath. The two
variables are global (the trace macros reference them directly), so the
per-device sysctls share one backing store, which is fine for a debug
knob.
[8 lines not shown]
aq(4): enable jumbo frames, software LRO, and suspend/resume
- Configure the RX buffer size from the interface MTU and enable jumbo
frames up to 9000 bytes, replacing the fixed standard-frame setup.
- Advertise IFCAP_LRO so iflib coalesces received TCP segments with its
software tcp_lro(9), like every other in-tree iflib driver
(ix/igc/em/vmxnet3); aq does no hardware LRO. iflib builds the
per-RX-queue LRO context unconditionally, so the capability bit is all
that is required; enabled by default via isc_capenable, toggle at
runtime with ifconfig.
- Add suspend/shutdown/resume handlers, replacing the unimplemented-
function placeholders. aq_if_shutdown/aq_if_suspend stop the interface
and deinitialize the hardware; aq_if_resume re-resets the F/W, re-reads
the mailbox address and re-selects fw_ops via aq_hw_mpi_create() before
iflib re-inits, because the runtime init path (aq_hw_init) reuses the
cached mailbox/fw_ops and a D3 power cycle can clear them. iflib calls
IFDI_RESUME unconditionally, so this also covers resuming while the
[4 lines not shown]
aq(4): interrupt model and queue-count correctness
Rework the MSI-X and queue-count handling to use the standard iflib
interrupt model and to keep every ring serviced.
- Cap isc_n{tx,rx}qsets_max at the RSS indirection-table size
(HW_ATL_RSS_INDIRECTION_QUEUES_MAX, 8) instead of HW_ATL_B0_RINGS_MAX.
RSS only steers RX traffic to eight rings, so on hosts with more CPUs
the surplus TX rings never make progress: iflib flowid-steers TCP
flows across every TX ring, and a flow landing on a surplus ring has
its segments queued but never transmitted, hanging the connection.
- Add a TX-specific ifdi_tx_queue_intr_enable that reads
tx_rings[txqid]->msix. It was wired to the RX handler, which indexes
rx_rings[] with the qid; safe only while tx_rings_count ==
rx_rings_count, otherwise the lookup walks past rx_rings[] and feeds a
garbage msix value into the IRQ mask register.
- Fix three MSI-X / admin-IRQ bugs: the TX softirq was attached to
[14 lines not shown]
aq(4): adopt native FreeBSD errno convention
Convert the driver's internal error-handling chain from the Linux
negative-errno convention to FreeBSD positive errno everywhere.
- All `return (-EXXX)` become `return (EXXX)`, `int err = -EXXX` loses
the sign, and `if (err < 0)` checks become `if (err != 0)` across
aq_fw.c, aq_fw1x.c, aq_fw2x.c and aq_hw.c.
- mac_soft_reset_flb_ returns ETIMEDOUT/0 instead of a bool so it
matches its RBL sibling.
- The ETIME and EOK aliases in aq_common.h are removed; all sites use
ETIMEDOUT and 0 directly, and the `rc = -rc` sign flips in
aq_if_attach_pre are dropped.
Turn AQ_HW_WAIT_FOR into a statement expression evaluating to 0 on
success or ETIMEDOUT on timeout, assigned explicitly at all seven call
sites, instead of silently assigning ETIMEDOUT to a variable named err
in the caller scope. A statement expression rather than an inline
function because every call must re-evaluate its condition each
[24 lines not shown]
aq(4): Fix RSS indirection table OOB write and queue distribution
Two related fixes to `aq(4)`'s RSS indirection table handling:
1. Fix an out-of-bounds stack write in `aq_hw_rss_set()`. RSS table entries are 3 bits (8 queues max), but with more than 8 RX rings `rss_table[]` holds larger values; the 32-bit write then spills one `uint16_t` past `bitary[]` and corrupts the stack, so the NIC never links or the kernel panics. Mask each value to 3 bits and pack 16 bits at a time to keep the write in bounds.
2. Build the indirection table in `aq_if_attach_post()` with a modulo over `min(rx_rings_count, HW_ATL_RSS_INDIRECTION_QUEUES_MAX)` instead of `i & (rx_rings_count - 1)`, which assumed a power-of-two ring count.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D57240
aq(4): RX/TX and HW-path correctness and hardening
Independent correctness fixes, plus robustness against a non-responding
device, malformed descriptor writeback, and torn MMIO reads, and the move
to the FreeBSD bus_space(9) register abstraction.
Correctness:
- aq_hw_ver_match returned true if any of major/minor/build was >=
expected; compare lexicographically so e.g. 2.0.1 is correctly seen as
older than 2.1.0.
- The VLAN hardware-filter iteration used the vlan tag directly as the
bitstring index; use vlan_tag + 1 so the active-VLAN bookkeeping lines
up with the table.
- aq_initmedia only registered IFM_AUTO in full-duplex/pause variants, so
a bare "ifconfig aq0 media autoselect" matched no entry and returned
ENXIO. Add the bare IFM_ETHER|IFM_AUTO entry, matching ix/em/igc/ixv.
- Convert the per-ring diagnostic counters to counter(9): per-CPU,
tear-free, no atomics on the increment path, fixing a data race and a
32-bit torn read against the locklessly-read sysctls. Drop three
[30 lines not shown]