Merge tag 's390-7.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fix from Alexander Gordeev:
- s390 selects GENERIC_LOCKBREAK when PREEMPT is enabled to tackle an
old compile error that no longer exists. Since recently PREEMPT is
always enabled, this LOCKBREAK config causes massive performance
regressions.
Remove GENERIC_LOCKBREAK from s390 Kconfig to fix the degradation.
* tag 's390-7.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Remove GENERIC_LOCKBREAK Kconfig option
Merge tag 'net-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from IPsec and netfilter.
This is relatively small, mostly because we are a bit behind our PW
queue. I'm not aware of any pending regression.
Current release - regressions:
- netfilter: nf_tables_offload: drop device refcount on error
Previous releases - regressions:
- core: add pskb_may_pull() to skb_gro_receive_list()
- xfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()
- ipv6: fix a potential NPD in cleanup_prefix_route()
[49 lines not shown]
Merge tag 'pmdomain-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- imx: Fix OF node refcount
- ti: Fix wakeup configuration for parent devices of wakeup sources
* tag 'pmdomain-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx: fix OF node refcount
pmdomain: ti_sci: add wakeup constraint to parent devices of wakeup source
Merge tag 'gpio-fixes-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix NULL pointer dereference in gpio-mvebu
- fix runtime PM leak in remove path in gpio-zynq
- reject invalid module params in gpio-mockup
- fix generic IRQ chip leak in remove parh in gpio-rockchip
- fix resource leaks in GPIO chip cleanup path on hog failure
- fix a regression in how GPIO hogging code handles multiple GPIO chips
reusing the same OF node
* tag 'gpio-fixes-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: handle gpio-hogs only once
[5 lines not shown]
octeontx2-af: fix IP fragment flag corruption on custom KPU profile load
npc_cn20k_apply_custom_kpu() overwrites KPU profile entries with custom
firmware values and then calls npc_cn20k_update_action_entries_n_flags()
over all entries. Since the same function already ran during default
profile initialisation, entries not overridden by the custom firmware
get their flags translated twice, corrupting the CN20K-specific values.
Fix this by extracting the per-entry translation into a helper
npc_cn20k_translate_action_flags() and calling it as each custom entry
is loaded, removing the redundant batch call at the end.
Fixes: ef992a0f12e8 ("octeontx2-af: npc: cn20k: MKEX profile support")
Cc: Suman Ghosh <sumang at marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark at marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj at marvell.com>
Reviewed-by: Simon Horman <horms at kernel.org>
Link: https://patch.msgid.link/20260608095455.1499203-1-nshettyj@marvell.com
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
Merge tag 'nf-26-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Revalidate bridge ports, add missing NULL checks to fetch the bridge
device by the port. From Florian Westphal.
2) Fix netdevice refcount leak in the error path of nft_fwd hardware
offload function, also from Florian.
3) Unregister helper expectfn callback on conntrack helper module
removal, otherwise dangling pointer remains in place,
from Weiming Shi.
[26 lines not shown]
Merge tag 'ipsec-2026-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2026-06-10
1) xfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()
Propagate SKBFL_SHARED_FRAG when paged fragments are moved between
skbs so ESP can decide whether in-place crypto is safe.
2) xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload
Replace the unlocked read of xtfs->ra_newskb with a local flag so a
concurrent reassembly can no longer free first_skb between
spin_unlock and the post-loop check.
3) xfrm: policy: fix use-after-free on inexact bin in xfrm_policy_bysel_ctx()
Prune the inexact bin under xfrm_policy_lock so a concurrent
xfrm_hash_rebuild() can no longer free it before xfrm_policy_kill()
[32 lines not shown]
ipv6: Fix a potential NPD in cleanup_prefix_route()
addrconf_get_prefix_route() can return the fib6_null_entry sentinel
entry which has a NULL fib6_table pointer. Therefore, before setting the
route's expiration time, check that we are not working with this entry,
as otherwise a NPD will be triggered [1].
Note that the other callers of addrconf_get_prefix_route() are not
susceptible to this bug:
1. addrconf_prefix_rcv(): Requests a route with the 'RTF_ADDRCONF |
RTF_PREFIX_RT' flags which are not set on fib6_null_entry.
2. modify_prefix_route(): Fixed by commit a747e02430df ("ipv6: avoid
possible NULL deref in modify_prefix_route()").
3. __ipv6_ifa_notify(): Calls ip6_del_rt() which specifically checks for
fib6_null_entry and returns an error.
[30 lines not shown]
Merge branch 'net-txgbe-fix-module-identification'
Jiawen Wu says:
====================
net: txgbe: fix module identification
For AML devices, there are some issues where the wrong module
indentified then configure PHY failed.
The module info buffers should be initialized to 0 before the firmware
returns information. And DECLARE_PHY_INTERFACE_MASK() does not guarantee
zeroed contents, so explicitly clear the temporary interface masks before
setting supported interfaces.
Rework txgbe_identify_module() to validate module identifiers through
explicit type checks instead of relying on transceiver_type heuristics.
When using the SFP module, transceiver_type could be a random value,
because it was read from an invalid register.
[4 lines not shown]
net: txgbe: initialize PHY interface to 0
DECLARE_PHY_INTERFACE_MASK() does not guarantee zeroed contents. Add a
new macro DECLARE_PHY_INTERFACE_MASK_ZERO(), make the stack variable to
be zeroed before setting supported interfaces.
Fixes: 57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices")
Signed-off-by: Jiawen Wu <jiawenwu at trustnetic.com>
Link: https://patch.msgid.link/20260608070842.36504-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
net: txgbe: initialize module info buffer
The module info buffer should be initialized to 0 before the firmware
returns information. Otherwise, there is a risk that the buffer field
not filled by the firmware is random value.
Fixes: 343929799ace ("net: txgbe: Support to handle GPIO IRQs for AML devices")
Signed-off-by: Jiawen Wu <jiawenwu at trustnetic.com>
Link: https://patch.msgid.link/20260608070842.36504-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
net: txgbe: distinguish module types by checking identifier
Rework txgbe_identify_module() to validate module identifiers through
explicit type checks instead of relying on transceiver_type heuristics.
When using the SFP module, transceiver_type could be a random value,
because it was read from an invalid register.
Fixes: 57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices")
Signed-off-by: Jiawen Wu <jiawenwu at trustnetic.com>
Link: https://patch.msgid.link/20260608070842.36504-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
Merge branch 'net-mvpp2-fix-xdp-rx-buffer-handling'
Til Kaiser says:
====================
net: mvpp2: fix XDP RX buffer handling
This is v5 of the earlier XDP_PASS fix. The XDP_PASS change is
retained, and the series also fixes related RX/XDP buffer handling
issues found during review.
Tested with tools/testing/selftests/drivers/net/xdp.py on mvpp2
hardware.
====================
Link: https://patch.msgid.link/20260607134943.21996-1-mail@tk154.de
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
net: mvpp2: sync RX data at the hardware packet offset
mvpp2 programs the RX queue packet offset, so hardware writes received
data at dma_addr + MVPP2_SKB_HEADROOM. The current CPU sync starts at
dma_addr and only covers rx_bytes + MVPP2_MH_SIZE bytes, which syncs the
unused headroom and misses the same number of bytes at the packet tail.
On non-coherent DMA systems this can leave the CPU reading stale cache
contents for the end of the received frame.
Use dma_sync_single_range_for_cpu() with MVPP2_SKB_HEADROOM as the range
offset so the sync covers the Marvell header and packet data actually
written by hardware.
Fixes: e1921168bbd4 ("mvpp2: sync only the received frame")
Signed-off-by: Til Kaiser <mail at tk154.de>
Link: https://patch.msgid.link/20260607134943.21996-2-mail@tk154.de
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
net: mvpp2: limit XDP frame size to the RX buffer
mvpp2 has short and long BM pools, and short pool buffers can be smaller
than PAGE_SIZE. The XDP path nevertheless initializes every xdp_buff with
PAGE_SIZE as frame size.
XDP helpers use frame_sz to validate tail growth and to derive the hard
end of the data area. Advertising PAGE_SIZE for short buffers can let
bpf_xdp_adjust_tail() grow a packet past the real allocation, corrupting
memory or later tripping skb tailroom checks.
Initialize the XDP buffer with bm_pool->frag_size so XDP tailroom matches
the actual buffer backing the packet.
Fixes: 07dd0a7aae7f ("mvpp2: add basic XDP support")
Signed-off-by: Til Kaiser <mail at tk154.de>
Link: https://patch.msgid.link/20260607134943.21996-3-mail@tk154.de
Signed-off-by: Paolo Abeni <pabeni at redhat.com>
net: mvpp2: refill RX buffers before XDP or skb use
The RX error path returns the current descriptor buffer to the hardware
BM pool. That is only valid while the driver still owns the buffer.
mvpp2_rx_refill() can fail after the current buffer has been handed to
XDP or attached to an skb. In those cases mvpp2_run_xdp() may have
recycled, redirected, or queued the page for XDP_TX, and an skb free also
retires the data buffer. Returning such a buffer to BM lets hardware DMA
into memory that is no longer owned by the RX ring.
Refill the BM pool before handing the current buffer to XDP or to the
skb. If the allocation fails there, drop the packet and return the
still-owned current buffer to BM, preserving the pool depth. Once the
refill succeeds, later local drops retire/free the current buffer instead
of returning it to BM.
Fixes: 07dd0a7aae7f ("mvpp2: add basic XDP support")
Fixes: d6526926de73 ("net: mvpp2: fix memory leak in mvpp2_rx")
[3 lines not shown]
net: mvpp2: build skb from XDP-adjusted data on XDP_PASS
When an XDP program uses bpf_xdp_adjust_head() or bpf_xdp_adjust_tail()
and then returns XDP_PASS, mvpp2 still builds the skb from fixed offsets
derived from the original RX descriptor. Packet geometry changes made by
the XDP program are therefore discarded before the skb reaches the stack.
Update rx_offset and rx_bytes from xdp.data and xdp.data_end for
XDP_PASS. This makes skb_reserve() and skb_put() reflect the packet seen
by XDP, and makes RX byte accounting for XDP_PASS follow the length of the
skb passed to the network stack.
Keep a separate rx_sync_size for page-pool recycling on skb allocation
failure, which must stay tied to the received buffer range.
Non-PASS verdicts continue to account the descriptor length because no skb
is passed up in those cases.
Fixes: 07dd0a7aae7f ("mvpp2: add basic XDP support")
[3 lines not shown]
Merge tag 'pm-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These address some remaining fallout after introducing dynamic EPP
support in the amd-pstate driver during the current development cycle:
- Restore allowing writing EPP of 0 when in performance mode in the
amd-pstate driver which was unnecessarily disallowed by one of the
recent updates (Mario Limonciello)
- Remove stale documentation of the epp_cached field in struct
amd_cpudata that has been dropped recently (Zhan Xusheng)"
* tag 'pm-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Fix setting EPP in performance mode
cpufreq/amd-pstate: drop stale @epp_cached kdoc
netfilter: nft_meta_bridge: fix stale stack leak via IIFHWADDR register
NFT_META_BRI_IIFHWADDR declares its destination register with
len = ETH_ALEN (6 bytes), which the register-init tracking rounds up to
two 32-bit registers (8 bytes). nft_meta_bridge_get_eval() then does
memcpy(dest, br_dev->dev_addr, ETH_ALEN), writing only 6 bytes and
leaving the upper 2 bytes of the second register as uninitialised
nft_do_chain() stack. A downstream load of that register span leaks
those stale bytes to userspace.
Zero the second register before the memcpy so the full declared span is
written.
Fixes: cbd2257dc96e ("netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support")
Cc: stable at vger.kernel.org
Signed-off-by: Davide Ornaghi <d.ornaghi97 at gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo at netfilter.org>
netfilter: nft_fib: fix stale stack leak via the OIFNAME register
For NFT_FIB_RESULT_OIFNAME the destination register is declared with
len = IFNAMSIZ (four 32-bit registers), but on the lookup-fail,
RTN_LOCAL and oif-mismatch paths nft_fib{4,6}_eval() only writes one
register via "*dest = 0". The remaining three registers are left as
whatever was on the stack in nft_do_chain()'s struct nft_regs, and a
downstream expression that loads the register span can leak that
uninitialised kernel stack to userspace.
The NFTA_FIB_F_PRESENT existence check has the same shape: it is only
meaningful for NFT_FIB_RESULT_OIF, yet it was accepted for any result type
while the eval stores a single byte via nft_reg_store8(), leaving the rest
of the declared span stale.
Fix both:
- replace the bare "*dest = 0" in the eval with nft_fib_store_result(),
which strscpy_pad()s the whole IFNAMSIZ for OIFNAME (and is already
[11 lines not shown]
netfilter: nft_exthdr: fix register tracking for F_PRESENT flag
nft_exthdr_init() passes user-controlled priv->len to
nft_parse_register_store(), which marks that many bytes in the
register bitmap as initialized. However, when NFT_EXTHDR_F_PRESENT
is set, the eval paths write only 1 byte (nft_reg_store8) or
4 bytes (*dest = 0 on TCP/DCCP error path). When len > 4,
registers beyond the first are never written, retaining
uninitialized stack data from nft_regs.
Bail out if userspace requests too much data when F_PRESENT is set.
Reported-by: Ji'an Zhou <eilaimemedsnaimel at gmail.com>
Fixes: c078ca3b0c5b ("netfilter: nft_exthdr: Add support for existence check")
Signed-off-by: Florian Westphal <fw at strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo at netfilter.org>
netfilter: nf_log: validate MAC header was set before dumping it
The fallback path of dump_mac_header() guards the MAC header access
only with "skb->mac_header != skb->network_header", without checking
skb_mac_header_was_set(). When the MAC header is unset, mac_header is
0xffff, so the test passes and skb_mac_header(skb) returns
skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
dev->hard_header_len bytes out of bounds into the kernel log.
This is reachable via the netdev logger: nf_log_unknown_packet() calls
dump_mac_header() unconditionally, and an skb sent through AF_PACKET
with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
unset (__dev_queue_xmit(), which would reset it, is bypassed).
Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
uses, and replace the open-coded MAC header length test with
skb_mac_header_len(). Only skbs with an unset MAC header are affected;
valid ones are dumped as before.
[21 lines not shown]
netfilter: x_tables: avoid leaking percpu counter pointers
The native and compat get-entries paths copy the fixed rule entry header
from the kernelized rule blob to userspace before overwriting the entry's
counter fields with a sanitized counter snapshot.
On SMP kernels, entry->counters.pcnt contains the percpu allocation
address used by x_tables rule counters. A caller can provide a userspace
buffer that faults during the initial fixed-header copy after pcnt has
been copied but before the later sanitized counter copy runs. The syscall
then returns -EFAULT while leaving the raw percpu pointer in userspace.
Copy only the fixed entry prefix before counters from the kernelized rule
blob, then copy the sanitized counter snapshot into the counter field.
Apply this ordering to the IPv4, IPv6, and ARP native and compat
get-entries implementations so a fault cannot expose the internal percpu
counter pointer.
Fixes: 71ae0dff02d7 ("netfilter: xtables: use percpu rule counters")
[2 lines not shown]
netfilter: nf_conntrack: destroy stale expectfn expectations on unregister
NAT helpers such as nf_nat_h323 store a raw pointer to module text in
exp->expectfn (e.g. ip_nat_q931_expect). nf_ct_helper_expectfn_unregister()
only unlinks the callback descriptor and never walks the expectation table,
so an expectation pending at module removal survives with a dangling
exp->expectfn into freed module text.
When the expected connection arrives, init_conntrack() invokes
exp->expectfn(), now a stale pointer into the unloaded module. Reproduced
on a KASAN build by loading the H.323 helpers, creating a Q.931
expectation, unloading nf_nat_h323, then connecting to the expected port:
Oops: int3: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:0xffffffffa06102d1
init_conntrack.isra.0 (net/netfilter/nf_conntrack_core.c:1862)
nf_conntrack_in (net/netfilter/nf_conntrack_core.c:2049)
ipv4_conntrack_local (net/netfilter/nf_conntrack_proto.c:223)
nf_hook_slow (net/netfilter/core.c:619)
[24 lines not shown]
netfilter: nf_tables_offload: drop device refcount on error
Reported by sashiko:
If nft_flow_action_entry_next() returns NULL, dev reference leaks.
Fixes: c6f85577584b ("netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it")
Reported-by: Juri Lelli <juri.lelli at redhat.com>
Signed-off-by: Florian Westphal <fw at strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo at netfilter.org>
netfilter: revalidate bridge ports
ebt_redirect_tg() dereferences br_port_get_rcu() return without a
NULL check, causing a kernel panic when the bridge port has been
removed between the original hook invocation and an NFQUEUE
reinject.
A mere NULL check isn't sufficient, however. As sashiko review
points out userspace can not only remove the port from the bridge,
it could also place the device in a different virtual device, e.g.
macvlan.
If this happens, we must drop the packet, there is no way for us to
reinject it into the bridge path.
Switch to _upper API, we don't need the bridge port structure.
Also, this fix keeps another bug intact:
Both nfnetlink_log and nfnetlink_queue use CONFIG_BRIDGE_NETFILTER
[11 lines not shown]
rds: mark snapshot pages dirty in rds_info_getsockopt()
rds_info_getsockopt() pins the destination user pages with FOLL_WRITE and
the RDS_INFO_* producers memcpy the snapshot into them through
kmap_atomic(). Because that copy goes through the kernel direct map, the
dirty bit on the user PTE is never set, so unpin_user_pages() releases the
pages without marking them dirty. A file-backed destination page can then
be reclaimed without writeback, silently discarding the copied data.
Use unpin_user_pages_dirty_lock() with make_dirty=true so the modified
pages are marked dirty before they are unpinned.
Fixes: a8c879a7ee98 ("RDS: Info and stats")
Signed-off-by: Breno Leitao <leitao at debian.org>
Reviewed-by: Allison Henderson <achender at kernel.org>
Link: https://patch.msgid.link/20260608-rds_fix-v1-1-006c88543408@debian.org
Signed-off-by: Jakub Kicinski <kuba at kernel.org>
ip6_vti: fix incorrect tunnel matching in vti6_tnl_lookup()
In vti6_tnl_lookup(), when an exact match for a tunnel fails,
the code falls back to searching for wildcard tunnels:
- Tunnels matching the packet's local address, with any remote address
wildcard remote).
- Tunnels matching the packet's remote address, with any local address
(wildcard local).
However, vti6 stores all these different types of tunnels in the same
hash table (ip6n->tnls_r_l) prone to hash collisions.
The bug is that the fallback search loops in vti6_tnl_lookup() were
missing checks to ensure that the candidate tunnel actually has
a wildcard address.
Fixes: fbe68ee87522 ("vti6: Add a lookup method for tunnels with wildcard endpoints.")
[5 lines not shown]
Merge tag 'riscv-for-linux-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
- Fix the implementation of the CFI branch landing pad control prctl()s
to return -EINVAL if unknown control bits are set, rather than
silently ignoring the request; and add a kselftest for this case
- Fix unaligned access performance testing to happen earlier in boot,
which fixes a performance regression in the lib/checksum code
- Fix a binfmt_elf warning when dumping core (due to missing
.core_note_name for CFI registers)
* tag 'riscv-for-linux-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: cfi: reject unknown flags in PR_SET_CFI
riscv: Fix fast_unaligned_access_speed_key not getting initialized
riscv/ptrace: Use USER_REGSET_NOTE_TYPE for REGSET_CFI
namespace: restrict OPEN_TREE_NAMESPACE/FSMOUNT_NAMESPACE to directories
open_tree(..., OPEN_TREE_NAMESPACE) and
fsmount(..., FSMOUNT_NAMESPACE, ...) currently work on non-directories,
like regular files. That's bad for two reasons:
- It ends up mounting a regular file over the inherited namespace root,
which is a directory; mounting a non-directory over a directory is
normally explicitly forbidden, see for example do_move_mount()
- It causes setns() on the new namespace to set the cwd to a regular
file, which the rest of VFS does not expect
Fix it by restricting create_new_namespace() (which is used by both of
these flags) to directories.
Leave the behavior for OPEN_TREE_CLONE as-is, that seems unproblematic.
Fixes: 9b8a0ba68246 ("mount: add OPEN_TREE_NAMESPACE")
[6 lines not shown]