Linux/linux c2ee9f5tools/testing/selftests/kvm Makefile

KVM: selftests: Fix build on on non-x86 architectures

Commit 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX
instructions") unconditionally added -march=x86-64-v2 to the CFLAGS used
to build the KVM selftests which does not work on non-x86 architectures:

  cc1: error: unknown value ‘x86-64-v2’ for ‘-march’

Fix this by making the addition of this x86 specific command line flag
conditional on building for x86.

Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions")
Signed-off-by: Mark Brown <broonie at kernel.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
DeltaFile
+3-1tools/testing/selftests/kvm/Makefile
+3-11 files

Linux/linux a360f31net/9p client.c

9p: fix slab cache name creation for real

This was attempted by using the dev_name in the slab cache name, but as
Omar Sandoval pointed out, that can be an arbitrary string, eg something
like "/dev/root".  Which in turn trips verify_dirent_name(), which fails
if a filename contains a slash.

So just make it use a sequence counter, and make it an atomic_t to avoid
any possible races or locking issues.

Reported-and-tested-by: Omar Sandoval <osandov at fb.com>
Link: https://lore.kernel.org/all/ZxafcO8KWMlXaeWE@telecaster.dhcp.thefacebook.com/
Fixes: 79efebae4afc ("9p: Avoid creating multiple slab caches with the same name")
Acked-by: Vlastimil Babka <vbabka at suse.cz>
Cc: Dominique Martinet <asmadeus at codewreck.org>
Cc: Thorsten Leemhuis <regressions at leemhuis.info>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
DeltaFile
+3-1net/9p/client.c
+3-11 files

Linux/linux d129377Documentation/virt/kvm api.rst, arch/arm64/kvm sys_regs.c nested.c

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix the guest view of the ID registers, making the relevant fields
     writable from userspace (affecting ID_AA64DFR0_EL1 and
     ID_AA64PFR1_EL1)

   - Correcly expose S1PIE to guests, fixing a regression introduced in
     6.12-rc1 with the S1POE support

   - Fix the recycling of stage-2 shadow MMUs by tracking the context
     (are we allowed to block or not) as well as the recycling state

   - Address a couple of issues with the vgic when userspace
     misconfigures the emulation, resulting in various splats. Headaches
     courtesy of our Syzkaller friends


    [55 lines not shown]
DeltaFile
+70-7arch/arm64/kvm/sys_regs.c
+46-7arch/arm64/kvm/nested.c
+29-23arch/arm64/kvm/hyp/nvhe/hyp-init.S
+35-6arch/arm64/kvm/vgic/vgic-init.c
+17-10arch/x86/kvm/mmu/mmu.c
+9-7Documentation/virt/kvm/api.rst
+206-6019 files not shown
+277-10325 files

Linux/linux c1bc09dkernel/trace trace_uprobe.c

Merge tag 'probes-fixes-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull uprobe fix from Masami Hiramatsu:

 - uprobe: avoid out-of-bounds memory access of fetching args

   Uprobe trace events can cause out-of-bounds memory access when
   fetching user-space data which is bigger than one page, because it
   does not check the local CPU buffer size when reading the data. This
   checks the read data size and cut it down to the local CPU buffer
   size.

* tag 'probes-fixes-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  uprobe: avoid out-of-bounds memory access of fetching args
DeltaFile
+6-3kernel/trace/trace_uprobe.c
+6-31 files

Linux/linux 7166c32fs namespace.c, fs/afs rxrpc.c

Merge tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "afs:
   - Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock

 netfs:
   - Drop the references to a folio immediately after the folio has been
     extracted to prevent races with future I/O collection

   - Fix a documenation build error

   - Downgrade the i_rwsem for buffered writes to fix a cifs reported
     performance regression when switching to netfslib

  vfs:
   - Explicitly return -E2BIG from openat2() if the specified size is
     unexpectedly large. This aligns openat2() with other extensible
     struct based system calls

    [25 lines not shown]
DeltaFile
+59-24fs/afs/rxrpc.c
+14-33fs/netfs/buffered_read.c
+6-3fs/ocfs2/file.c
+4-2fs/nilfs2/page.c
+3-1fs/namespace.c
+2-1fs/netfs/locking.c
+88-646 files not shown
+95-6712 files

Linux/linux a777c32lib/crypto/mpi mpi-mul.c

Merge tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix a regression in mpi that broke RSA"

* tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: lib/mpi - Fix an "Uninitialized scalar variable" issue
DeltaFile
+1-1lib/crypto/mpi/mpi-mul.c
+1-11 files

Linux/linux 373b933kernel/trace trace_uprobe.c

uprobe: avoid out-of-bounds memory access of fetching args

Uprobe needs to fetch args into a percpu buffer, and then copy to ring
buffer to avoid non-atomic context problem.

Sometimes user-space strings, arrays can be very large, but the size of
percpu buffer is only page size. And store_trace_args() won't check
whether these data exceeds a single page or not, caused out-of-bounds
memory access.

It could be reproduced by following steps:
1. build kernel with CONFIG_KASAN enabled
2. save follow program as test.c

```
\#include <stdio.h>
\#include <stdlib.h>
\#include <string.h>


    [115 lines not shown]
DeltaFile
+6-3kernel/trace/trace_uprobe.c
+6-31 files

Linux/linux 42f7652. Makefile

Linux 6.12-rc4
DeltaFile
+1-1Makefile
+1-11 files

Linux/linux d7f513adrivers/bluetooth btusb.c, net/bluetooth iso.c af_bluetooth.c

Merge tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Pull bluetooth fixes from Luiz Augusto Von Dentz:

 - ISO: Fix multiple init when debugfs is disabled

 - Call iso_exit() on module unload

 - Remove debugfs directory on module init failure

 - btusb: Fix not being able to reconnect after suspend

 - btusb: Fix regression with fake CSR controllers 0a12:0001

 - bnep: fix wild-memory-access in proto_unregister

Note: normally the bluetooth fixes go through the networking tree, but
this missed the weekly merge, and two of the commits fix regressions
that have caused a fair amount of noise and have now hit stable too:

    [13 lines not shown]
DeltaFile
+9-18drivers/bluetooth/btusb.c
+1-5net/bluetooth/iso.c
+3-0net/bluetooth/af_bluetooth.c
+1-2net/bluetooth/bnep/core.c
+14-254 files

Linux/linux dd4f503drivers/pinctrl pinctrl-ocelot.c pinctrl-aw9523.c, drivers/pinctrl/intel pinctrl-intel-platform.c

Merge tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Mostly error path fixes, but one pretty serious interrupt problem in
  the Ocelot driver as well:

   - Fix two error paths and a missing semicolon in the Intel driver

   - Add a missing ACPI ID for the Intel Panther Lake

   - Check return value of devm_kasprintf() in the Apple and STM32
     drivers

   - Add a missing mutex_destroy() in the aw9523 driver

   - Fix a double free in cv1800_pctrl_dt_node_to_map() in the Sophgo
     driver

   - Fix a double free in ma35_pinctrl_dt_node_to_map_func() in the

    [14 lines not shown]
DeltaFile
+7-2drivers/pinctrl/stm32/pinctrl-stm32.c
+4-4drivers/pinctrl/pinctrl-ocelot.c
+4-2drivers/pinctrl/pinctrl-aw9523.c
+2-3drivers/pinctrl/intel/pinctrl-intel-platform.c
+3-0drivers/pinctrl/pinctrl-apple-gpio.c
+1-1drivers/pinctrl/nuvoton/pinctrl-ma35.c
+21-122 files not shown
+23-138 files

Linux/linux c552282. MAINTAINERS, Documentation/devicetree/bindings/iio/dac adi,ad5686.yaml

Merge tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull misc driver fixes from Greg KH:
 "Here are a number of small char/misc/iio driver fixes for 6.12-rc4:

   - loads of small iio driver fixes for reported problems

   - parport driver out-of-bounds fix

   - Kconfig description and MAINTAINERS file updates

  All of these, except for the Kconfig and MAINTAINERS file updates have
  been in linux-next all week. Those other two are just documentation
  changes and will have no runtime issues and were merged on Friday"

* tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (39 commits)
  misc: rtsx: list supported models in Kconfig help
  MAINTAINERS: Remove some entries due to various compliance requirements.
  misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for OTP device

    [18 lines not shown]
DeltaFile
+0-177MAINTAINERS
+17-36Documentation/devicetree/bindings/iio/dac/adi,ad5686.yaml
+17-15drivers/iio/frequency/Kconfig
+11-12drivers/iio/imu/bmi323/bmi323_core.c
+11-11drivers/parport/procfs.c
+9-8drivers/iio/dac/ltc2664.c
+65-25917 files not shown
+116-26623 files

Linux/linux c01ac4bdrivers/tty n_gsm.c, drivers/tty/serial qcom_geni_serial.c imx.c

Merge tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 6.12-rc4:

   - qcom-geni serial driver fixes, wow what a mess of a UART chip that
     thing is...

   - vt infoleak fix for odd font sizes

   - imx serial driver bugfix

   - yet-another n_gsm ldisc bugfix, slowly chipping down the issues in
     that piece of code

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:

    [12 lines not shown]
DeltaFile
+48-55drivers/tty/serial/qcom_geni_serial.c
+15-0drivers/tty/serial/imx.c
+1-1include/linux/soc/qcom/geni-se.h
+2-0drivers/tty/n_gsm.c
+1-1drivers/tty/vt/vt.c
+67-575 files

Linux/linux b68c189. MAINTAINERS, drivers/usb/dwc3 core.c gadget.c

Merge tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes and new device ids for 6.12-rc4:

   - xhci driver fixes for a number of reported issues

   - new usb-serial driver ids

   - dwc3 driver fixes for reported problems.

   - usb gadget driver fixes for reported problems

   - typec driver fixes

   - MAINTAINER file updates

  All of these have been in linux-next this week with no reported issues"


    [16 lines not shown]
DeltaFile
+30-38drivers/usb/host/xhci-ring.c
+50-5drivers/usb/host/xhci-dbgtty.c
+15-5drivers/usb/gadget/udc/dummy_hcd.c
+19-0drivers/usb/dwc3/core.c
+11-0MAINTAINERS
+6-4drivers/usb/dwc3/gadget.c
+131-528 files not shown
+151-5814 files

Linux/linux db87114arch/x86/entry entry_32.S, arch/x86/include/asm nospec-branch.h

Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Explicitly disable the TSC deadline timer when going idle to address
   some CPU errata in that area

 - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path

 - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation

 - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit

 - Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature

    [12 lines not shown]
DeltaFile
+14-9arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+13-1arch/x86/kernel/apic/apic.c
+10-1arch/x86/include/asm/nospec-branch.h
+4-2arch/x86/entry/entry_32.S
+2-2arch/x86/kernel/cpu/resctrl/core.c
+2-1arch/x86/kernel/cpu/amd.c
+45-161 files not shown
+47-167 files

Linux/linux 949c9efdrivers/irqchip irq-sifive-plic.c irq-riscv-intc.c

Merge tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix a case for sifive-plic where an interrupt gets disabled *and*
   masked and remains masked when it gets reenabled later

 - Plug a small race in GIC-v4 where userspace can force an affinity
   change of a virtual CPU (vPE) in its unmapping path

 - Do not mix the two sets of ocelot irqchip's registers in the mask
   calculation of the main interrupt sticky register

- Other smaller fixlets and cleanups

* tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzg2l: Fix missing put_device
  irqchip/riscv-intc: Fix SMP=n boot with ACPI
  irqchip/sifive-plic: Unmask interrupt in plic_irq_enable()

    [6 lines not shown]
DeltaFile
+17-12drivers/irqchip/irq-sifive-plic.c
+18-1drivers/irqchip/irq-riscv-intc.c
+12-6drivers/irqchip/irq-gic-v3-its.c
+14-2drivers/irqchip/irq-renesas-rzg2l.c
+8-2drivers/irqchip/irq-mscc-ocelot.c
+0-7drivers/irqchip/Kconfig
+69-302 files not shown
+73-328 files

Linux/linux 2b4d250kernel task_work.c, kernel/rcu tasks.h

Merge tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduling fixes from Borislav Petkov:

 - Add PREEMPT_RT maintainers

 - Fix another aspect of delayed dequeued tasks wrt determining their
   state, i.e., whether they're runnable or blocked

 - Handle delayed dequeued tasks and their migration wrt PSI properly

 - Fix the situation where a delayed dequeue task gets enqueued into a
   new class, which should not happen

 - Fix a case where memory allocation would happen while the runqueue
   lock is held, which is a no-no

 - Do not over-schedule when tasks with shorter slices preempt the
   currently running task

    [16 lines not shown]
DeltaFile
+41-24kernel/sched/core.c
+33-15kernel/sched/stats.h
+7-20kernel/sched/fair.c
+13-2kernel/task_work.c
+9-4kernel/sched/syscalls.c
+9-0kernel/rcu/tasks.h
+112-6511 files not shown
+148-7417 files

Linux/linux a5ee44cdrivers/xen acpi.c privcmd.c, drivers/xen/xen-pciback pci_stub.c

Merge tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "A single fix for a build failure introduced this merge window"

* tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Remove dependency between pciback and privcmd
DeltaFile
+24-0drivers/xen/acpi.c
+9-5include/xen/acpi.h
+9-2drivers/xen/xen-pciback/pci_stub.c
+2-4drivers/xen/privcmd.c
+0-1drivers/xen/Kconfig
+44-125 files

Linux/linux 10e93e1include/trace/events dma.h

Merge tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Just another small tracing fix from Sean"

* tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory
DeltaFile
+8-8include/trace/events/dma.h
+8-81 files

Linux/linux e9001a3arch/arm64/include/asm kvm_asm.h, arch/arm64/kernel asm-offsets.c

Merge tag 'kvmarm-fixes-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #3

- Stop wasting space in the HYP idmap, as we are dangerously close
  to the 4kB limit, and this has already exploded in -next

- Fix another race in vgic_init()

- Fix a UBSAN error when faking the cache topology with MTE
  enabled
DeltaFile
+29-23arch/arm64/kvm/hyp/nvhe/hyp-init.S
+11-2arch/arm64/kvm/vgic/vgic-init.c
+6-1arch/arm64/kvm/vgic/vgic-kvm-device.c
+1-1arch/arm64/kvm/sys_regs.c
+1-0arch/arm64/kernel/asm-offsets.c
+1-0arch/arm64/include/asm/kvm_asm.h
+49-276 files

Linux/linux ddd5c58arch/arm64/kvm sys_regs.c nested.c, arch/arm64/kvm/vgic vgic-init.c

Merge tag 'kvmarm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #2

- Fix the guest view of the ID registers, making the relevant fields
  writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)

- Correcly expose S1PIE to guests, fixing a regression introduced
  in 6.12-rc1 with the S1POE support

- Fix the recycling of stage-2 shadow MMUs by tracking the context
  (are we allowed to block or not) as well as the recycling state

- Address a couple of issues with the vgic when userspace misconfigures
  the emulation, resulting in various splats. Headaches courtesy
  of our Syzkaller friends
DeltaFile
+69-6arch/arm64/kvm/sys_regs.c
+46-7arch/arm64/kvm/nested.c
+24-4arch/arm64/kvm/vgic/vgic-init.c
+13-3tools/testing/selftests/kvm/aarch64/set_id_regs.c
+8-7arch/arm64/kvm/mmu.c
+6-6arch/arm64/kvm/hypercalls.c
+166-334 files not shown
+183-3510 files

Linux/linux 773cca1tools/testing/selftests/kvm/x86_64 cpuid_test.c

KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups

When looking for a "mangled", i.e. dynamic, CPUID entry, terminate the
walk based on the number of array _entries_, not the size in bytes of
the array.  Iterating based on the total size of the array can result in
false passes, e.g. if the random data beyond the array happens to match
a CPUID entry's function and index.

Fixes: fb18d053b7f8 ("selftest: kvm: x86: test KVM_GET_CPUID2 and guest visible CPUIDs against KVM_GET_SUPPORTED_CPUID")
Signed-off-by: Sean Christopherson <seanjc at google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets at redhat.com>
Message-ID: <20241003234337.273364-2-seanjc at google.com>
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
DeltaFile
+1-1tools/testing/selftests/kvm/x86_64/cpuid_test.c
+1-11 files

Linux/linux 3ec4350arch/riscv/kvm aia_imsic.c

RISCV: KVM: use raw_spinlock for critical section in imsic

For the external interrupt updating procedure in imsic, there was a
spinlock to protect it already. But since it should not be preempted in
any cases, we should turn to use raw_spinlock to prevent any preemption
in case PREEMPT_RT was enabled.

Signed-off-by: Cyan Yang <cyan.yang at sifive.com>
Reviewed-by: Yong-Xuan Wang <yongxuan.wang at sifive.com>
Reviewed-by: Anup Patel <anup at brainfault.org>
Message-ID: <20240919160126.44487-1-cyan.yang at sifive.com>
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
DeltaFile
+4-4arch/riscv/kvm/aia_imsic.c
+4-41 files

Linux/linux 9a40006tools/testing/selftests/kvm Makefile

KVM: selftests: x86: Avoid using SSE/AVX instructions

Some distros switched gcc to '-march=x86-64-v3' by default and while it's
hard to find a CPU which doesn't support it today, many KVM selftests fail
with

  ==== Test Assertion Failure ====
    lib/x86_64/processor.c:570: Unhandled exception in guest
    pid=72747 tid=72747 errno=4 - Interrupted system call
    Unhandled exception '0x6' at guest RIP '0x4104f7'

The failure is easy to reproduce elsewhere with

   $ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test

The root cause of the problem seems to be that with '-march=x86-64-v3' GCC
uses AVX* instructions (VMOVQ in the example above) and without prior
XSETBV() in the guest this results in #UD. It is certainly possible to add
it there, e.g. the following saves the day as well:

    [4 lines not shown]
DeltaFile
+1-0tools/testing/selftests/kvm/Makefile
+1-01 files

Linux/linux 731285farch/x86/kvm/vmx vmx.c

KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()

Reset the segment cache after segment initialization in vmx_vcpu_reset()
to harden KVM against caching stale/uninitialized data.  Without the
recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
scenario is possible:

 - vCPU is just created, and the vCPU thread is preempted before
   SS.AR_BYTES is written in vmx_vcpu_reset().

 - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
   vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.

 - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
   invoke vmx_segment_cache_clear() to invalidate the cache.

As a result, KVM retains a stale value in the cache, which can be read,
e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM

    [16 lines not shown]
DeltaFile
+3-3arch/x86/kvm/vmx/vmx.c
+3-31 files

Linux/linux f559b2earch/x86/kvm/svm nested.c

KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory

Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits
4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't
enforce 32-byte alignment of nCR3.

In the absolute worst case scenario, failure to ignore bits 4:0 can result
in an out-of-bounds read, e.g. if the target page is at the end of a
memslot, and the VMM isn't using guard pages.

Per the APM:

  The CR3 register points to the base address of the page-directory-pointer
  table. The page-directory-pointer table is aligned on a 32-byte boundary,
  with the low 5 address bits 4:0 assumed to be 0.

And the SDM's much more explicit:

  4:0    Ignored

    [12 lines not shown]
DeltaFile
+5-1arch/x86/kvm/svm/nested.c
+5-11 files

Linux/linux 28cf497arch/x86/kvm/mmu mmu.c

KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()

Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
mmu_invalidate_in_progress is elevated, or that the range is being zapped
due to memslot removal (loosely detected by slots_lock being held).
Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
as KVM's page fault path snapshots state before acquiring mmu_lock, and
thus can create SPTEs with stale information if vCPUs aren't forced to
retry faults (due to seeing an in-progress or past MMU invalidation).

Memslot removal is a special case, as the memslot is retrieved outside of
mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
instead relies on SRCU synchronization to ensure any in-flight page faults
are fully resolved before zapping SPTEs.

Signed-off-by: Sean Christopherson <seanjc at google.com>
Message-ID: <20241009192345.1148353-3-seanjc at google.com>
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
DeltaFile
+11-0arch/x86/kvm/mmu/mmu.c
+11-01 files

Linux/linux 5a27984Documentation/virt/kvm api.rst

KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL

Massage the documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL to call out that
it applies to moved memslots as well as deleted memslots, to avoid KVM's
"fast zap" terminology (which has no meaning for userspace), and to reword
the documented targeted zap behavior to specifically say that KVM _may_
zap a subset of all SPTEs.  As evidenced by the fix to zap non-leafs SPTEs
with gPTEs, formally documenting KVM's exact internal behavior is risky
and unnecessary.

Signed-off-by: Sean Christopherson <seanjc at google.com>
Message-ID: <20241009192345.1148353-4-seanjc at google.com>
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
DeltaFile
+9-7Documentation/virt/kvm/api.rst
+9-71 files

Linux/linux 58a20a9arch/x86/kvm/mmu mmu.c

KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot

When performing a targeted zap on memslot removal, zap only MMU pages that
shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and
unnecessary.  Furthermore, for_each_gfn_valid_sp() arguably shouldn't
exist, because it doesn't do what most people would it expect it to do.
The "round gfn for level" adjustment that is done for direct SPs (no gPTE)
means that the exact gfn comparison will not get a match, even when a SP
does "cover" a gfn, or was even created specifically for a gfn.

For memslot deletion specifically, KVM's behavior will vary significantly
based on the size and alignment of a memslot, and in weird ways.  E.g. for
a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if
it's only 4KiB aligned.  And as described below, zapping SPs in the
aligned case overzaps for direct MMUs, as odds are good the upper-level
SPs are serving other memslots.

To iterate over all potentially-relevant gfns, KVM would need to make a
pass over the hash table for each level, with the gfn used for lookup

    [20 lines not shown]
DeltaFile
+6-10arch/x86/kvm/mmu/mmu.c
+6-101 files

Linux/linux 8e690b8arch/x86/kernel kvm.c

x86/kvm: Override default caching mode for SEV-SNP and TDX

AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).

This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().

Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
Suggested-by: Binbin Wu <binbin.wu at intel.com>
Cc: Juergen Gross <jgross at suse.com>
Cc: Tom Lendacky <thomas.lendacky at amd.com>
Reviewed-by: Juergen Gross <jgross at suse.com>
Message-ID: <20241015095818.357915-1-kirill.shutemov at linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
DeltaFile
+4-0arch/x86/kernel/kvm.c
+4-01 files

Linux/linux bc07eeaDocumentation/virt/kvm locking.rst, include/linux kvm_host.h

KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic

The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux at treblig.org>
Message-ID: <20241001141354.18009-3-linux at treblig.org>
[Adjust Documentation/virt/kvm/locking.rst. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
DeltaFile
+0-6virt/kvm/kvm_main.c
+1-1Documentation/virt/kvm/locking.rst
+0-1include/linux/kvm_host.h
+1-83 files