Restructure alert plugin to the lean typesafe layout
## Problem
The alert plugin passed mypy but didn't follow the typesafe convention the other converted plugins use: the eponymous `alert` service lived in `alert.py` as an 1100-line `Service` with all logic, models, and helpers inline, `__init__.py` was empty, and the two sibling services each combined their service class and service part in one off-convention file.
## Solution
- Move `AlertService` into a lean `__init__.py` that exposes only the endpoint stubs; each delegates to plain functions in `lifecycle.py` / `runtime.py` / `oneshot.py` / `queries.py` that take `(context, state)`.
- Lift all mutable runtime state into a dedicated `AlertState` object (`state.py`) built once in `__init__`. Concurrency is intentionally unchanged — it still relies on the asyncio event loop plus the existing `process_alerts` job lock, with no new lock introduced.
- Pull the standalone pieces into `state.py`, `alert_classes.py`, and `serialize.py`, and split the siblings into `alertservice.py` / `alertservice_crud.py` and `alertclasses.py` / `alertclasses_config.py` to match the `*_crud.py` / `*_config.py` convention.
- Move the inline `AlertOneshotDelete` models into the API package, and update `main.py` imports plus the setup-ordering key (`alert.alert` -> `alert`, since `setup()` now lives in the package `__init__`).
Remove stale ARC graph names from reporting API
## Problem
`reporting.get_data` accepted three graph names — `arcrate`, `arcactualrate`, `arcresult` — whose backing plugin classes were deleted during the ZFS netdata plugin rewrite. The Pydantic `Literal` and the in-memory `__graphs` dict drifted out of sync, so passing any of them crashed `netdata_get_data` with an uncaught `KeyError`.
## Solution
Removed the dead names from `GraphIdentifier.name`'s `Literal` and docstring in both `v26_0_0/reporting.py` and `v27_0_0/reporting.py`. Added a `ReportingNetdataGetDataArgs.from_previous` on each so legacy WS clients walking the adapter chain get the dead entries silently filtered instead of a hard rejection at the final v27 boundary. Hardened the dispatch site in `plugins/reporting/graphs.py` to raise `CallError(ENOENT)` for any unknown name — mirroring what `netdata_graph` already does — so future schema/implementation drift surfaces as a clean RPC error rather than an unhandled exception.
Process BACKUP in vrrp rapid-succession branch
When VrrpEventThread saw a second rapid event after waiting
rapid_event_settle_time, it dropped the latest queued event and
logged a warning. On boot-time keepalived flaps where the
MASTER->BACKUP gap floors below max_wait, that drop swallowed
the only BACKUP signal middleware was going to see, so
vrrp_backup never ran.
Fire the hook for BACKUP (skipping if vrrp_backup already ran
this process lifetime, tracked via a new LAST_EVENT_TYPE
attribute on FailoverEventsService); keep the drop+warn for
MASTER, since acting on an unsettled MASTER would kick off
fenced + zpool import.
Wait for bdev to actually disappear after delete
spdk_bdev_unregister's cb_fn (which sends the RPC reply) fires before
spdk_bdev_close(desc) in spdk_bdev_unregister_by_name, so bdev_aio_delete
can return while the underlying close(fd) on the zvol is still pending.
NAS-140333 / 26.0.0-RC.1 / Treat terminal TNC finalize errors as failure (by sonicaj) (#19041)
This commit wires the registration finalization loop through the shared
classifier from truenas_connect_utils, so a 400 with a non-retryable
error code now sets REGISTRATION_FINALIZATION_FAILED and exits instead
of polling fruitlessly until the 45-minute claim token expires. Network
errors, 5xx, 408, 429 and the pending-registration "not found" case
still retry with the existing backoff. Adds unit tests for the
classifier.
Original PR: https://github.com/truenas/middleware/pull/19036
Co-authored-by: Waqar Ahmed <waqarahmedjoyia at live.com>
Treat terminal TNC finalize errors as failure
This commit wires the registration finalization loop through the shared classifier from truenas_connect_utils, so a 400 with a non-retryable error code now sets REGISTRATION_FINALIZATION_FAILED and exits instead of polling fruitlessly until the 45-minute claim token expires. Network errors, 5xx, 408, 429 and the pending-registration "not found" case still retry with the existing backoff. Adds unit tests for the classifier.
NAS-140333 / 27.0.0-BETA.1 / Treat terminal TNC finalize errors as failure (#19036)
This commit wires the registration finalization loop through the shared
classifier from truenas_connect_utils, so a 400 with a non-retryable
error code now sets REGISTRATION_FINALIZATION_FAILED and exits instead
of polling fruitlessly until the 45-minute claim token expires. Network
errors, 5xx, 408, 429 and the pending-registration "not found" case
still retry with the existing backoff. Adds unit tests for the
classifier.
Treat terminal TNC finalize errors as failure
This commit wires the registration finalization loop through the shared classifier from truenas_connect_utils, so a 400 with a non-retryable error code now sets REGISTRATION_FINALIZATION_FAILED and exits instead of polling fruitlessly until the 45-minute claim token expires. Network errors, 5xx, 408, 429 and the pending-registration "not found" case still retry with the existing backoff. Adds unit tests for the classifier.
NAS-140739 / 27.0.0-BETA.1 / Rewrite CPU temperature reading via sysfs hwmon (#18922)
## Problem
The legacy CPU temperature pipeline (`utils/cpu.py` +
`utils/sensors.py`) consumed `libsensors` via a 481-line ctypes wrapper
and attributed temperatures to logical CPUs using ordinal arithmetic and
string-sort heuristics rather than kernel topology. Several real defects
followed.
** 1.5× over-reported aggregate on AMD APUs (Ryzen 7 5825U class).** The
orchestrator wrote each broadcast temperature to `cpuN` and then
mirrored it to its HT sibling via `ht_map`. On AMD APUs the kernel
enumerates SMT siblings *consecutively* (cpu0/cpu1, cpu2/cpu3, …), so
`ht_map = {cpu0: cpu1, cpu2: cpu3, …}`. The mirror loop then wrote each
even-indexed primary twice (once as primary, once as mirror) and each
odd-indexed primary once (mirror lookup misses), producing `total = 12T
/ len = 8 = 1.5T` for any uniform input T, plus only 8 of 16 dashboard
bars (cpu8..cpu15 were never populated).
[129 lines not shown]
NAS-141113 / 26.0.0-RC.1 / Expose all_sed property on zpool.query (by sonicaj) (#19040)
This PR adds changes to surface the all_sed flag on zpool.query entries
so callers can identify pools made up entirely of Self-Encrypting Drives
without going through pool.query.
Original PR: https://github.com/truenas/middleware/pull/19035
---------
Co-authored-by: Waqar Ahmed <waqarahmedjoyia at live.com>
Co-authored-by: caleb <yocalebo at gmail.com>
NAS-141113 / 27.0.0-BETA.1 / Expose all_sed property on zpool.query (#19035)
This PR adds changes to surface the all_sed flag on zpool.query entries
so callers can identify pools made up entirely of Self-Encrypting Drives
without going through pool.query.
---------
Co-authored-by: caleb <yocalebo at gmail.com>
NAS-141114 / 27.0.0-BETA.1 / `zpool.query`: remove `stripe` topology key, fold single-disk vdevs into `data` (by creatorcary) (#19039)
## Summary
Removes the `stripe` key from the `zpool.query` topology output and
folds single-disk (childless) top-level vdevs into the existing `data`
key, so all storage vdevs are reported in one place.
## Background
`query_impl._format_topology()` was splitting top-level storage vdevs
into two buckets:
- `data` — vdevs that have children (mirror, raidz, etc.)
- `stripe` — single-disk vdevs with no children
This split was a middleware-only invention. ZFS itself has no separate
"stripe" vdev type — a lone disk is simply a top-level data vdev. The
distinction added nothing the caller couldn't derive from
[16 lines not shown]
NAS-141114 / 26.0.0-RC.1 / `zpool.query`: remove `stripe` topology key, fold single-disk vdevs into `data` (#18993)
## Summary
Removes the `stripe` key from the `zpool.query` topology output and
folds single-disk (childless) top-level vdevs into the existing `data`
key, so all storage vdevs are reported in one place.
## Background
`query_impl._format_topology()` was splitting top-level storage vdevs
into two buckets:
- `data` — vdevs that have children (mirror, raidz, etc.)
- `stripe` — single-disk vdevs with no children
This split was a middleware-only invention. ZFS itself has no separate
"stripe" vdev type — a lone disk is simply a top-level data vdev. The
distinction added nothing the caller couldn't derive from
[9 lines not shown]