Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"s390:
- Fix S390_USER_OPEREXEC so it can now be enabled regardless of other
unrelated capabilities
- Fix handling of the _PAGE_UNUSED pte bit that could lead to guest
memory corruption in some scenarios
- A bunch of misc gmap fixes (locking, behaviour under memory
pressure)
- Fix CMMA dirty tracking
x86:
- Tidy up some WARN_ON() and BUG_ON(), replacing them with
[33 lines not shown]
Merge tag 'block-7.2-20260625' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- blk-cgroup locking rework and fixes:
- fix a use-after-free in __blkcg_rstat_flush()
- defer freeing policy data until after an RCU grace period
- defer the blkcg css_put until the blkg is unlinked from
the queue
- unwind the queue_lock nesting under RCU / blkcg->lock
across the lookup, create, associate and destroy paths
- NVMe fixes via Keith:
- Fix a crash and memory leak during invalid cdev teardown,
and related cdev cleanups (Maurizio, John)
- nvmet fixes: handle TCP_CLOSING in the tcp state_change
handler, reject short AUTH_RECEIVE buffers, handle inline
data with a nonzero offset in rdma, fix an sq refcount leak,
and allocate ana_state with the port (Maurizio, Michael,
[54 lines not shown]
Merge tag 'io_uring-7.2-20260625' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Fix a file reference leak in the nop opcode when used with
IOSQE_FIXED_FILE
- Preserve the SQ array entries when resizing the ring via the register
path
- Preserve the partial result for an iopoll request rather than
overwriting it
- Don't audit log IORING_OP_RECV_ZC
- Bound io_pin_pages() by the page array byte size in the memmap path
- Follow-up cleanup to the task_work mpscq conversion, getting rid of
the now-unnecessary tw_pending tracking for the !DEFER_TASKRUN path
[11 lines not shown]
Merge tag 'gpio-fixes-for-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix locking context with shared GPIOs in gpio-tegra
- fix IRQ domain leak in error path in gpio-davinci
- fix returning a potentially uninitialized integer in
gpiochip_set_multiple()
- use raw spinlock in gpio-eic-sprd and gpio-sch to address locking
context issues
- bail out of probe() if registering the GPIO chip fails in gpio-mlxbf3
- fix varible type for storing the "ngpios" property in gpio-pisosr
- fix out-of-bounds pin access in GPIO ACPI
[22 lines not shown]
Merge tag 'pwrseq-fixes-for-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing fixes from Bartosz Golaszewski:
- fix an ABBA deadlock in pwrseq unregister path
- fix a use-after-free bug in pwrseq core
- sort PCI device IDs in ascending order in pwrseq-pcie-m2
* tag 'pwrseq-fixes-for-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: fix ABBA deadlock in pwrseq_device_unregister()
power: sequencing: pcie-m2: Sort PCI device IDs in ascending order
pwrseq: core: fix use-after-free in pwrseq_debugfs_seq_next()
Merge tag 'docs-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux
Pull more documentation updates from Jonathan Corbet:
"A handful of late-arriving docs fixes, along with one document update
that fell through the cracks before"
* tag 'docs-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux:
docs: tools: Fix typo 'ackward' to 'awkward' in unittest.rst
kdoc: xforms: ignore special static/inline macros
kdoc: xforms_lists: handle DECLARE_PER_CPU() in kernel-doc
MAINTAINERS: Fix regex for kdoc
docs: kgdb: Fix path of driver options
Documentation: tracing: fix typo in events documentation
Docs/driver-api/uio-howto: document mmap_prepare callback
docs/mm: clarify that we are not looking for LLM generated content
kernel-doc: xforms: support __SYSFS_FUNCTION_ALTERNATIVE()
Merge tag 'kbuild-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull more Kbuild updates from Nathan Chancellor:
- Link host programs with ld.lld when $(LLVM) is set to match user's
expectations that LLVM will be used exclusively during the build
process
- Fix modpost warnings from static variable name promotion that can
happen more aggressively with the recently merged distributed ThinLTO
support
- Add an optional warning for user-supplied Kconfig values that changed
after processing, such as out of range values or options that have
incorrect / missing dependencies
* tag 'kbuild-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kconfig: add optional warnings for changed input values
modpost: Ignore Clang LTO suffixes in symbol matching
kbuild: Use ld.lld for linking host programs when LLVM is set
Merge tag 'for-linus-7.2-1' of https://github.com/cminyard/linux-ipmi
Pull ipmi updates from Corey Minyard:
"Lots of little tweaks.
Nothing huge, the biggest issue was a possible refcount underflow that
could cause a memory leak in some situations. Otherwise, fixing
formatting and style things and some docs typos"
* tag 'for-linus-7.2-1' of https://github.com/cminyard/linux-ipmi:
docs: ipmi: Fix path of the "hotmod" module parameter
ipmi: Drop unused assignment of platform_device_id driver data
ipmi: si: Use platform_get_irq_optional() to retrieve interrupt
ipmi: fix refcount leak in i_ipmi_request()
ipmi:ssif: Drop unused assignment of platform_device_id driver data
ipmi: Fix user refcount underflow in event delivery
ipmi: Use named initializers for struct i2c_device_id
ipmi: Use LIST_HEAD() to initialize on stack list head
ipmi:kcs: Reduce the number of retries
Merge tag 'rust-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust addendum from Miguel Ojeda:
"A second, tiny pull request later in the merge window with a small
patch to simplify cross-tree development:
'kernel' crate:
- 'prelude' module: add 'zerocopy{,_derive}::IntoBytes'.
This will simplify using 'zerocopy' in several trees next cycle"
* tag 'rust-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: prelude: add `zerocopy{,_derive}::IntoBytes`
Merge tag 'rust-fixes-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Work around a 'rustc' bug by setting the 'frame-pointer' LLVM
module flag under 'CONFIG_FRAME_POINTER'.
The upcoming Rust 1.98.0 is fixed.
- Doctests: fix incorrect replacement pattern.
'kernel' crate:
- Mark 'Debug' impl as '#[inline]'"
* tag 'rust-fixes-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: Kbuild: set frame-pointer llvm module flag for CONFIG_FRAME_POINTER
rust: doctest: fix incorrect pattern in replacement
rust: bitfield: mark `Debug` impl as `#[inline]`
Merge tag 'pci-v7.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Remove MPS/MRRS Kconfig settings (CONFIG_PCIE_BUS_*) that worked
around a WiFi device defect; use a quirk or boot-time
"pci=pcie_bus_tune_*" kernel parameter instead (Bjorn Helgaas)
- Always lift 2.5GT/s restriction in PCIe failed link retraining to
avoid clamping a link to 2.5GT/s after hot-plug changes the device
(Maciej W. Rozycki)
- Request bus reassignment when not probe-only to fix an enumeration
regression on Marvell CN106XX and possibly other DT-based systems
(Ratheesh Kannoth)
- Fix procfs race between pci_proc_init() and pci_bus_add_device()
that resulted in 'proc_dir_entry ... already registered' warnings
[358 lines not shown]
Merge tag 'irq-msi-2026-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI irq fix from Ingo Molnar:
- Revert a change that added a bad iounmap(NULL) call
to the MSI IRQ support code (Yuanhe Shu)
* tag 'irq-msi-2026-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "PCI/MSI: Unmap MSI-X region on error"
Merge tag 'sparc-for-7.2-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
Pull sparc updates from Andreas Larsson:
- Align sparc to other archs by providing ucontext.h wrapper
- Fix buffer underflow in led driver
- Export mcount for clang and disable compat when using lld for linking
- API choice improvement for sysfs code for vio
- Fix build warnings and notification of missing prototype
- Remove dead code and dead configs
* tag 'sparc-for-7.2-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
sparc: Remove remaining defconfig references to the pktcdvd driver
sparc: led: avoid trimming a newline from empty writes
[9 lines not shown]
Merge tag 'apparmor-pr-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"Another round of bug fixing and some code cleanups, there are no new
features. The biggest thing to note is Georgia is being added to help
co-maintain apparmor.
Cleanups:
- replace get_zeroed_page() with kzalloc()
- remove unnecessary goto and associated label
- change fn_label_build() to return err on failure instead of NULL or
err
- free rawdata as soon as possible
- use explicit instead of implicit flex array in rawdata_f_data
- use __label_make_stale in __aa_proxy_redirect
- return correct error by propagate -ENOMEM correctly in unpack_table
- aa_label_alloc use aa_label_free on alloc failure
- add a conditional version of get_newest_label
[45 lines not shown]
Merge tag 'input-for-v7.2-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- A new driver for Wacom W9000-series penabled touchscreens
- Updates to STM FTS driver adding support for reset line and preparing
the driver for STMFTS5 support
- Updates to RMI4 and IMS PCU drivers hardening the code
- Support for half-duplex mode restored in ADS7846 driver
- Updates to driver's device_id tables to use named initializers
- Removal of no longer used PCAP keys and touchscreen drivers (support
for the ezx series of phones was removed in 2022)
- Removal of xilinx_ps2 driver which is no longer used either
[28 lines not shown]
Merge tag 'ntfs3_for_7.2' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Added:
- depth limit to indx_find_buffer() to prevent stack overflow
- validate split-point offset in indx_insert_into_buffer()
- bounds check to run_get_highest_vcn()
- fileattr_get() and fileattr_set() support
- zero stale pagecache beyond valid data length
- handle delayed allocation overlap in run lookup
- validate lcns_follow in log_replay() conversion
- cap RESTART_TABLE free-chain walker at rt->used
- resize log->one_page_buf when adopting on-disk page size
- reject direct userspace writes to reserved $LX* xattrs
Fixed:
- out-of-bounds read in decompress_lznt()
- avoid -Wmaybe-uninitialized warnings
- hold ni_lock across readdir metadata walk
[45 lines not shown]
block: handle REQ_OP_ZONE_APPEND in __bio_integrity_action
Otherwise zone append commands will miss their integrity data. While
this works "fine" for auto-PI, it break file system PI and non-PI
metadata.
With this XFS on ZNS namespace with non-PI metadata and 512 byte sectors
with PI work, while PI 4k sector formats with PI work only when Caleb's
"block: fix integrity offset/length conversions" is applied as well.
Note that unlike regular writes, zone append does need remapping as
partitions are not supported on zoned block devices.
Fixes: df3c485e0e60 ("block: switch on bio operation in bio_integrity_prep")
Signed-off-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen at oracle.com>
Link: https://patch.msgid.link/20260624080014.1998650-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe at kernel.dk>
block: fix GFP_ flags confusion in bio_integrity_alloc_buf
bio_integrity_alloc_buf usage of GFP_ flags is messed up. For one it
mixes GFP_NOFS and GFP_NOIO for neighbouring allocations, but it also
makes the allocations fail more often than needed. That code was copied
from bio_alloc_bioset which needs to do that so that it can punt to the
rescuer workqueue, but none of that is needed for the integrity
allocations that either sits in the file system or at the very bottom
of the I/O stack. Failing early means we'll do a fully waiting
allocation from the mempool ->alloc callback which is usually much
larger than required.
Fix this by passing a gfp_t so that the file system path can pass
GFP_NOFS and the auto-integrity code can pass GFP_NOIO, and don't
modify the allocation type except for disabling warnings.
Fixes: ec7f31b2a2d3 ("block: make bio auto-integrity deadlock safe")
Signed-off-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen at oracle.com>
[2 lines not shown]
block, bfq: don't grab queue_lock to initialize bfq
The request_queue is frozen and quiesced while the elevator init_sched()
method runs, so queue_lock is not needed for BFQ cgroup initialization.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/1965073ea20f33114a8d903816b986e483b9bb34.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
mm/page_io: don't nest queue_lock under rcu in bio_associate_blkg_from_page()
Take a css reference under RCU, drop RCU, and then associate the bio with
the blkg. This avoids nesting queue_lock under RCU and prepares to protect
blkcg with blkcg_mutex instead of queue_lock.
Use css_tryget() instead of css_tryget_online() so swap writeback for
pages charged to a dying memcg still passes the dying css to
bio_associate_blkg_from_css(). That preserves the existing closest-live
ancestor fallback instead of charging those bios to the root blkg.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/c910d2c39d3ec97f67de68af636a52394342d55f.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
blk-cgroup: don't nest queue_lock under rcu in bio_associate_blkg()
If a bio is already associated with a blkg, the blkcg is already pinned
until the bio is done, so there is no need for RCU protection. Otherwise,
protect blkcg_css() with RCU independently. Prepare to protect blkcg with
blkcg_mutex instead of queue_lock.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/8496fa234b21d4b31b7f068766906d0bffcac8e6.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
blk-cgroup: don't nest queue_lock under blkcg->lock in blkcg_destroy_blkgs()
The correct lock order is q->queue_lock before blkcg->lock, and in order
to prevent deadlock from blkcg_destroy_blkgs(), trylock is used for
q->queue_lock while blkcg->lock is already held, this is hacky.
Refactor blkcg_destroy_blkgs() to hold blkcg->lock only long enough to
get the first blkg and then release it. Then take q->queue_lock and
blkcg->lock in the correct order to destroy the blkg. This is a very cold
path, so the extra lock/unlock cycles are acceptable.
Also prepare to convert protecting blkcg with blkcg_mutex instead of
queue_lock.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/00b03cf74a9937cb4d6dd67a189ddc00a3de0451.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
blk-cgroup: don't nest queue_lock under rcu in blkg_lookup_create()
Change this in two steps:
1) hold rcu lock and do blkg_lookup() from fast path;
2) hold queue_lock directly from slow path, and don't nest it under rcu
lock;
Prepare to convert protecting blkcg with blkcg_mutex instead of
queue_lock.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/93f33cc9e5a39dddb78dcd934d0c1d04b564fb00.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
blk-cgroup: don't nest queue_lock under rcu in blkcg_print_blkgs()
With previous modification to delay freeing policy data after an RCU grace
period, prfill() can run under RCU instead of taking queue_lock. However,
policy teardown can still clear blkg->pd[plid] after blkcg_print_blkgs()
observes the policy enabled bit.
Load policy data once with READ_ONCE() and skip the blkg if teardown
already cleared it. Do the same in recursive stat walks for descendant
blkgs. Remove the stale BFQ debug queue_lock assertion because
blkcg_print_blkgs() no longer calls prfill() with queue_lock held. This
also lets ioc_qos_prfill() and ioc_cost_model_prfill() use IRQ-safe
ioc->lock locking without re-enabling IRQs while queue_lock is still held.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/db7633d5e263dd1c2bf9b901762545a84b7d714e.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
blk-cgroup: delay freeing policy data after rcu grace period
Currently blkcg_print_blkgs() must hold RCU to iterate blkgs from a
blkcg, and prfill() must hold queue_lock to prevent policy data from
being freed by policy deactivation. As a consequence, queue_lock has to
be nested under RCU from blkcg_print_blkgs().
Delay freeing policy data until after an RCU grace period so prfill() can
be protected by RCU alone.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/e20e5d984b41a026d61851966bed35eb094c4bff.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
blk-cgroup: protect iterating blkgs with blkcg->lock in blkcg_print_stat()
blkcg_print_one_stat() will be called for each blkg:
- access blkg->iostat, which is freed from rcu callback
blkg_free_workfn();
- access policy data from pd_stat_fn(), which is freed from
pd_free_fn(), while pd_free_fn() can be called by removing blkcg or
deactivating policy;
Take blkcg->lock while iterating so the blkgs stay online and both
blkg->iostat and policy data for activated policies stay valid. Use
irq-safe locking because blkcg->lock can be nested under q->queue_lock,
which is used from IRQ completion paths.
Prepare to convert protecting blkgs from request_queue with mutex.
Signed-off-by: Yu Kuai <yukuai at fygo.io>
Link: https://patch.msgid.link/05799877e720dcd300e2ddd4625e8e162959d7cc.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe at kernel.dk>
x86/apic: KVM: Use cpu_physical_id() to get APIC ID of running vCPU for AVIC
Use cpu_physical_id() instead of default_cpu_present_to_apicid() when
getting the APIC ID of the pCPU on which a vCPU is running/loaded, as the
kernel has gone way off the rails if a vCPU is loaded on a pCPU that has
been physically removed from the system. Even if the impossible were to
happen, the absolutely worst case scenario is that hardware will ring the
AIVC doorbell on the wrong pCPU, i.e. a severely broken system will
experience mild performance issues.
Kill off KVM's superfluous kvm_cpu_get_apicid() wrapper along with the
for-KVM export of default_cpu_present_to_apicid(), as they existed purely
for the wonky AVIC usage.
Cc: Kai Huang <kai.huang at intel.com>
Cc: Yosry Ahmed <yosry at kernel.org>
Signed-off-by: Sean Christopherson <seanjc at google.com>
Acked-by: Naveen N Rao (AMD) <naveen at kernel.org>
Reviewed-by: Kai Huang <kai.huang at intel.com>
[3 lines not shown]
KVM: x86/mmu: Expose number of shadow MMU shadow pages as a stat
Turn arch.n_used_mmu_pages into a stat, mmu_shadow_pages, as the number of
live shadow pages is arguably _the_ most critical datapoint when it comes
to analyzing the shadow MMU. Before the TDP MMU came along, i.e. when the
shadow MMU was the only MMU, explicitly tracking the number of shadow pages
wasn't as interesting, because the same information could more or less be
gleaned from the pages_{1g,2m,4k} stats. But with the TDP MMU, where the
shadow MMU is only used for nested TDP, it becomes extremely difficult, if
not impossible, to determine which SPTEs are coming from the TDP MMU, and
which are coming from the shadow MMU.
E.g. when triaging/debugging shadow MMU performance issues due to "too many
shadow pages", being able to observe that 99%+ of all shadow pages are
unsync is critical to being able to deduce that KVM is effectively leaking
shadow pages.
Signed-off-by: Sean Christopherson <seanjc at google.com>
Message-ID: <20260612133727.411902-1-seanjc at google.com>
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
Merge tag 'kvm-s390-next-7.2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
* Fix S390_USER_OPEREXEC so it can now be enabled regardless of other
unrelated capabilities
* Fix handling of the _PAGE_UNUSED pte bit that could lead to guest
memory corruption in some scenarios
* A bunch of misc gmap fixes (locking, behaviour under memory pressure)
* Fix CMMA dirty tracking