Merge tag 'net-7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and batman-adv.
Current release - new code bugs:
- netfilter: cthelper: cap to maximum number of expectation per master
Previous releases - regressions:
- netpoll: fix a use-after-free on shutdown path
- tcp: restore RCU grace period in tcp_ao_destroy_sock
- ipv6: fix NULL deref in fib6_walk_continiue() on multi-batch dump
- batman-adv: dat: ensure accessible eth_hdr proto field
[46 lines not shown]
Merge tag 'mfd-fixes-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fix from Lee Jones:
- Add MFD mailing list to MAINTAINERS
* tag 'mfd-fixes-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
MAINTAINERS: Add a mailing list entry to MFD
Merge tag 'batadv-net-pullrequest-20260630' of https://git.open-mesh.org/batadv
Simon Wunderlich says:
====================
Here are some batman-adv bugfix, all by Sven Eckelmann:
- fix pointers after potential skb reallocs (5 patches)
- dat: ensure accessible eth_hdr proto field
* tag 'batadv-net-pullrequest-20260630' of https://git.open-mesh.org/batadv:
batman-adv: dat: ensure accessible eth_hdr proto field
batman-adv: bla: reacquire gw address after skb realloc
batman-adv: dat: acquire ARP hw source only after skb realloc
batman-adv: gw: acquire ethernet header only after skb realloc
batman-adv: access unicast_ttvn skb->data only after skb realloc
batman-adv: retrieve ethhdr after potential skb realloc on RX
====================
[3 lines not shown]
MAINTAINERS: Add a mailing list entry to MFD
This is to be included by all contributors and will be leaned on for
Sashiko's "reply to author" support.
Signed-off-by: Lee Jones <lee at kernel.org>
net/mlx5: HWS, fix matcher leak on resize target setup failure
hws_bwc_matcher_move() allocates a replacement matcher before setting it
as the resize target. If mlx5hws_matcher_resize_set_target() fails, the
replacement matcher is not attached anywhere and is leaked.
Fix the leak by destroying the replacement matcher before returning from
the resize-target failure path.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1.1.
An x86_64 allyesconfig build showed no new warnings. As we do not have a
mlx5 HWS-capable device to test with, no runtime testing was able to be
performed.
[7 lines not shown]
Merge tag 'bootconfig-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig fix from Masami Hiramatsu:
- bootconfig: Fix NULL-pointer arithmetic
Fix undefined pointer arithmetic in xbc_snprint_cmdline() when
probing the buffer length with NULL and size 0. Track the written
length as a size_t instead to prevent build-time UBSan/FORTIFY_SOURCE
failures.
* tag 'bootconfig-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
Merge tag 'nf-26-06-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
The following patchset contains Netfilter fixes for *net*.
Due to bug volume the plan is to make a second *net* pull request
this Friday.
1) Zero nf_conntrack_expect at allocation to prevent uninitialized data
leaks to userspace. Add missing exp->dir initialization.
2) Prevent out-of-bounds writes in nft_set_pipapo caused by inconsistent
clones during allocation failures. Fail operations if the clone enters an
error state. This was a day-0 bug.
3) Fix use-after-free race between ipset dump and array resizing. Protect
[48 lines not shown]
net/sched: hhf: clear heavy-hitter state on reset
HHF reset does not clear the classifier state used to identify heavy
hitters. Packets after reset can therefore be scheduled using flow
history from before the reset.
The reset operation should return the qdisc to an empty state.
Clear the heavy-hitter classifier tables when HHF is reset.
Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius at trailofbits.com>
Signed-off-by: David S. Miller <davem at davemloft.net>
net/sched: dualpi2: clear stale classification on filter miss
DualPI2 leaves previous classification state attached to an skb when
filter classification returns no match. The enqueue path can then act
on stale state from an earlier classification attempt.
A filter miss should fall back to the default class without reusing old
per-packet classification data.
Initialize the classification result to CLASSIC before running the
classifier. Explicit L4S, priority, and successful filter
classification can still override that default.
Fixes: 8f9516daedd6 ("sched: Add enqueue/dequeue of dualpi2 qdisc")
Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius at trailofbits.com>
Signed-off-by: David S. Miller <davem at davemloft.net>
Merge tag 'probes-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
"fprobe fixes and spelling typos:
- Fix NULL pointer dereference in fprobe_fgraph_entry(). Prevent
general protection faults by checking shadow-stack reservation
bounds. Skip mid-flight registered fprobes that were not counted
during sizing.
eprobe: fix string pointer extraction
- Correct the casting of string pointers read from the ringbuffer to
prevent truncation of base event pointer variables when
dereferencing FILTER_PTR_STRING fields.
tracing/probes: clean up argument parsing and BTF helper logic
- Make the $ prefix mandatory for comm access: Require the $ prefix
[28 lines not shown]
net/sched: act_bpf: use rcu_dereference_bh() to read the filter
tcf_bpf_act() can run from the tc egress path, which holds only
rcu_read_lock_bh(), but reads prog->filter with rcu_dereference() and
trips lockdep:
WARNING: suspicious RCU usage
net/sched/act_bpf.c:47 suspicious rcu_dereference_check() usage!
1 lock held by syz.2.1588/12756:
#0: (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit net/core/dev.c:4792
tcf_bpf_act+0x6ae/0x940 net/sched/act_bpf.c:47
tcf_classify+0x6e4/0x1080 net/sched/cls_api.c:1860
sch_handle_egress net/core/dev.c:4545 [inline]
__dev_queue_xmit+0x2185/0x2c00 net/core/dev.c:4808
packet_sendmsg+0x3dfa/0x5120 net/packet/af_packet.c:3114
The other tc actions and cls_bpf already use rcu_dereference_bh() here.
Do the same.
[5 lines not shown]
selftests: drv-net: tso: don't touch dangerous feature bits
query_nic_features() detects which offloads depend on tx-gso-partial
by enabling everything, turning tx-gso-partial off, and seeing which
active features drop out. Enabling all hw features is dangerous:
we may end up enabling rx-fcs and loopback for example. For the
ice driver we end up getting into problems with feature dependencies
so the cleanup isn't successful either, and the test exits with
rx-fcs and loopback enabled.
Scope the feature probing just to segmentation bits.
Fixes: 266b835e5e84 ("selftests: drv-net: tso: enable test cases based on hw_features")
Reviewed-by: Pavan Chebbi <pavan.chebbi at broadcom.com>
Reviewed-by: Daniel Zahka <daniel.zahka at gmail.com>
Link: https://patch.msgid.link/20260629233923.2151144-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba at kernel.org>
cxgb4: Fix decode strings dump for T6 adapters
Depending on the value of chip_version, the correct decode set is selected.
However, the subsequent matching with the t4 encoding type in the if-else
block results in a reassignment, which leads to the loss of support for
t6_decode as well as reinitializing of values t4_decode and t5_decode.
The component history shows that the if-else block previously used for
this purpose, as well as the execution order, was not affected by the
change.
Furthermore, it is suggested by the execution order that the scenario with
overwriting and loss of support will be implemented.
Delete the if-else block.
Fixes: 6df397539cb0 ("cxgb4: Update correct encoding of SGE Ingress DMA States for T6 adapter")
Signed-off-by: Gleb Markov <markov.gi at npc-ksb.ru>
Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
Link: https://patch.msgid.link/20260629130856.1168-1-markov.gi@npc-ksb.ru
Signed-off-by: Jakub Kicinski <kuba at kernel.org>
virtio_net: disable cb when NAPI is busy-polled
When busy-poll is active, napi_schedule_prep() returns false in
virtqueue_napi_schedule(), so virtqueue_disable_cb() is skipped.
The device may keep firing irqs until reaches virtqueue_napi_complete().
Under load (received == budget), it will lead to a large number
of spurious interrupts.
Fix it by disabling the callback at the virtnet_poll() entry.
This keeps the callback off while we poll and it is re-enabled by
virtqueue_napi_complete() when going idle.
Fixes: ceef438d613f ("virtio_net: remove custom busy_poll")
Acked-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Longjun Tang <tanglongjun at kylinos.cn>
Link: https://patch.msgid.link/20260629024230.37325-1-lange_tang@163.com
Signed-off-by: Jakub Kicinski <kuba at kernel.org>
sctp: fix addr_wq_timer race in sctp_free_addr_wq()
sctp_free_addr_wq() previously removed addr_wq_timer using timer_delete()
while holding addr_wq_lock. However, timer_delete() does not guarantee that
a currently running timer handler has completed.
This allows a race with sctp_addr_wq_timeout_handler(), where the handler
may still run after addr_waitq has been freed, acquire addr_wq_lock, and
access freed memory, leading to a use-after-free.
Fix this by calling timer_shutdown_sync() before taking addr_wq_lock. This
guarantees that any in-flight timer handler has finished and prevents the
timer from being re-armed during teardown, making subsequent cleanup safe.
Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
Reported-by: Sashiko <sashiko-bot at kernel.org>
Signed-off-by: Xin Long <lucien.xin at gmail.com>
Link: https://patch.msgid.link/5dc95f295bdb5c3f60e880dd9aa5112dc5c071cc.1782757874.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba at kernel.org>
selftests: net: bump default cmd() timeout to 20 seconds
We always used 5 sec as the default command timeout. But soon after
it was introduced, David effectively made us ignore the timeout
(it was passed to process.communicate() as the wrong argument).
Gal recently fixed that, but turns out the 5 sec is not enough
for a lot of tests and setups. The fix caused regressions.
In particular running reconfig commands (e.g. XDP attach) on mlx5
with 32 rings and 9k MTU, on a heavily-debug-enabled kernel takes
more than 5 sec. The XDP installation command will time out after
5 sec but since the sleeps in the kernel are non interruptible
the command finishes anyway, leaving the XDP program attached,
but with non-zero exit code. defer()ed cleanups are not installed,
breaking the environment for subsequent tests.
Since "install XDP" is a pretty normal command a "point fix"
does not seem appropriate. 32 rings is a fairly reasonable
config, too, so we should just increase the timeout to 20 sec.
[10 lines not shown]
bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
xbc_snprint_cmdline() is meant to be called twice: first with
buf=NULL, size=0 to probe the rendered length, then with a real
buffer to fill it (the standard snprintf() two-pass pattern). The
probe call makes the function compute "buf + size" (NULL + 0) and,
on every iteration, advance "buf += ret" from that NULL base and
pass the result back into snprintf().
Pointer arithmetic on a NULL pointer is undefined behavior. It is
harmless in the in-kernel callers today, but the follow-up patches
run this same code in the userspace tools/bootconfig parser at kernel
build time, where host UBSan / FORTIFY_SOURCE abort the build.
Track a running written length (size_t) instead of mutating @buf, and
only form "buf + len" when @buf is non-NULL. snprintf(NULL, 0, ...)
is itself well defined and returns the would-be length, so the
two-pass "probe then fill" usage returns identical byte counts.
[6 lines not shown]
tracing/probes: Make the $ prefix mandatory for comm access
Since $comm or $COMM are not event field but special fetcharg
variables to access current->comm, It should not be accessed
without '$' prefix even with typecast.
Link: https://lore.kernel.org/all/178231209724.732967.12049805699091810641.stgit@devnote2/
Fixes: 69efd863a785 ("tracing/eprobes: Allow use of BTF names to dereference pointers")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat at kernel.org>
tracing/probes: Fix double addition of offset for @+FOFFSET
Since commit 533059281ee5 ("tracing: probeevent: Introduce new argument
fetching code") wrongly use @offset local variable during the parsing,
the offset value is added twice when dereferencing.
Reset the @offset after setting it in FETCH_OP_FOFFS.
Link: https://lore.kernel.org/all/178217905962.643090.1978577464942171332.stgit@devnote2/
Fixes: 533059281ee5 ("tracing: probeevent: Introduce new argument fetching code")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat at kernel.org>
Cc: stable at vger.kernel.org
tracing/fprobe: Fix NULL pointer dereference in fprobe_fgraph_entry()
fprobe_fgraph_entry() sizes a shadow-stack reservation in one walk of
the per-ip fprobe list and fills it in a second walk, both under
rcu_read_lock() only. A fprobe registered on an already-live ip can
become visible between the two walks, so the fill walk processes an
exit_handler the sizing walk did not count and used runs past
reserved_words. If the sizing walk counted nothing, fgraph_data is NULL
and the first write_fprobe_header() faults:
Oops: general protection fault, probably for non-canonical address ...
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:fprobe_fgraph_entry+0xa38/0xf10 kernel/trace/fprobe.c:167
Call Trace:
<TASK>
function_graph_enter_regs+0x44c/0xa10 kernel/trace/fgraph.c:677
ftrace_graph_func+0xc5/0x140 arch/x86/kernel/ftrace.c:671
__kernel_text_address+0x9/0x40 kernel/extable.c:78
arch_stack_walk+0x117/0x170 arch/x86/kernel/stacktrace.c:26
[13 lines not shown]
tracing: eprobe: read the complete FILTER_PTR_STRING pointer
For a char * element in an event, the FILTER_PTR_STRING filter type is
used. When the event occurs, a pointer is stored in the ringbuffer.
If an eprobe references such a char * element of a "base event", the
stored pointer is truncated when it's read from the ringbuffer.
$ cd /sys/kernel/tracing
$ echo 'e rcu.rcu_utilization $s:x64 $s:string' > dynamic_events
$ echo 1 > tracing_on
$ echo 1 > events/eprobes/enable
$ sleep 1
$ echo 0 > events/eprobes/enable
$ cat trace
<idle>-0 ...: (rcu.rcu_utilization) arg1=0x4f arg2=(fault)
<idle>-0 ...: (rcu.rcu_utilization) arg1=0x2 arg2=(fault)
The problem is in get_event_field
[17 lines not shown]
bridge: stp: Fix a potential use-after-free when deleting a bridge
The three STP timers are not supposed to be armed while the bridge is
administratively down. They are synchronously deactivated when the
bridge is put administratively down and the various call sites check for
'IFF_UP' before arming them.
This check is missing from br_topology_change_detection() and it is
possible to engineer a situation in which the topology change timer is
armed while the bridge is administratively down, resulting in a
use-after-free [1] when the bridge is deleted.
Fix by adding the missing check and for good measures synchronously
shutdown the three timers when the bridge is deleted.
[1]
ODEBUG: free active (active state 0) object: ffff88811662b9b0 object type: timer_list hint: br_topology_change_timer_expired (net/bridge/br_stp_timer.c:120)
WARNING: lib/debugobjects.c:629 at debug_print_object+0x1bc/0x450, CPU#9: ip/359
[8 lines not shown]
net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
The teql master->slaves singly linked list is not protected against
multiple writes. It can be mod'ed concurently from teql_master_xmit(),
teql_dequeue(), teql_init() and teql_destroy() without holding any list
lock or RCU protection.
zdi-disclosures at trendmicro.com has demonstrated that the qdisc is freed
after an RCU grace period, but teql_master_xmit() running on another
CPU can still hold a stale pointer into the list, resulting in a
slab-use-after-free:
BUG: KASAN: slab-use-after-free in teql_master_xmit+0xf0f/0x16b0
Read of size 8 at addr ffff888013fb0440 by task poc/332
Freed 512-byte region [ffff888013fb0400, ffff888013fb0600) (kmalloc-512)
The fix?
Add a per-master slaves_lock spinlock that serializes all mutations of
master->slaves and the NEXT_SLAVE() links in teql_destroy() and
[22 lines not shown]
net: gianfar: dispose irq mappings on probe failure and device removal
irq_of_parse_and_map() creates irqdomain mappings that should be
balanced with irq_dispose_mapping(). The driver never called
irq_dispose_mapping(), leaking mappings on probe failure and
device removal.
Fix by adding irq_dispose_mapping() in free_gfar_dev() and
expanding its loop from priv->num_grps to MAXGROUPS so the
error path also catches partially-initialized groups. All
irqinfo pointers are pre-initialized to NULL in gfar_of_init(),
making the NULL-guarded walk in free_gfar_dev() safe for every
scenario.
gfar_parse_group() itself is left as a simple parse function
with no resource management; cleanup is centralized in the
caller's error path.
Assisted-by: opencode:big-pickle
[4 lines not shown]
net: lan743x: Initialize eth_syslock spinlock before use
lan743x_hardware_init() calls pci11x1x_strap_get_status() during the
PCI11x1x probe sequence. That helper acquires the Ethernet subsystem
hardware lock via lan743x_hs_syslock_acquire(), which relies on
adapter->eth_syslock_spinlock to serialize access.
The spinlock is currently initialized only after the strap status is
read. With CONFIG_DEBUG_SPINLOCK enabled, taking the zeroed initialized
spinlock can trip the spinlock debug check.
Fix by initializing adapter->eth_syslock_spinlock before reading the
strap status so the probe path never attempts to lock an uninitialized
spinlock.
Fixes: 46b777ad9a8c ("net: lan743x: Add support to SGMII 1G and 2.5G")
Cc: stable at vger.kernel.org # v6.0+
Signed-off-by: Andrea Righi <arighi at nvidia.com>
Reviewed-by: David Thompson <davthompson at nvidia.com>
[3 lines not shown]
net: libwx: fix VMDQ mask for 1-queue mode
In wx_set_vmdq_queues(), the VMDQ mask was not set for the devices not
supporting WX_FLAG_MULTI_64_FUNC, i.e., NGBE devices. A mask of 0 causes
__ALIGN_MASK(1, ~vmdq->mask) to return 0, which incorrectly sets
q_per_pool to 0 in wx_write_qde().
Fix the VMDQ 1-queue mask to 0x7F then ensures that __ALIGN_MASK(1,
~0x7F) correctly evaluates to 1.
Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
Signed-off-by: Jiawen Wu <jiawenwu at trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba at intel.com>
Link: https://patch.msgid.link/161F704D2C983E2C+20260626092530.551028-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
net: airoha: fix max receive size configuration
Set the GDM maximum receive size to AIROHA_MAX_RX_SIZE unconditionally
during hardware initialization instead of updating it according to the
configured MTU. This avoids dropping incoming frames that exceed the
current MTU but could still be processed by the networking stack, which
is able to fragment the reply on the TX side (e.g. ICMP echo requests).
Move the per-port MTU configuration to the PPE egress path where it
belongs, and set the tx frame size running airoha_ppe_set_xmit_frame_size()
to dynamically track the maximum MTU across running interfaces sharing
the same PPE instance.
Fix the PPE MTU register addressing to pack two port entries per
register word and add WAN_MTU0 configuration for non-LAN GDM devices.
Fixes: 54d989d58d2a ("net: airoha: Move min/max packet len configuration in airoha_dev_open()")
Tested-by: Madhur Agrawal <madhur.agrawal at airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo at kernel.org>
Link: https://patch.msgid.link/20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d@kernel.org
Signed-off-by: Paolo Abeni <pabeni at redhat.com>