cpu/x86_64: Fix do_cpuid() to explicitly set ECX=0
The old do_cpuid() did not initialize ECX before executing the CPUID
instruction, so the results could be incorrect when ECX contained a
non-zero garbage value.
This issue was observed on Intel CPUs when booting a FreeBSD 14.x/15.x
ISO under NVMM, where it caused a general protection fault (#GP) shortly
after the FreeBSD kernel was loaded:
qemu-system-x86_64: NVMM: Mem Assist Failed [gpa=0xbfff8]
qemu-system-x86_64: NVMM: Failed to execute a VCPU.
Abort trap (core dumped)
It occurred when NVMM tried to handle the reading of
IA32_ARCH_CAPABILITIES MSR but the second do_cpuid() returned the
incorrect results indicating that the MSR was unavailable.
The problem was first reported by mneumann in bug #3310 on 2025-11-26 [1].
[11 lines not shown]
nvmm(4): Expose ARCH_CAP to guest only if the host CPU supports it
* Don't expose ARCH_CAP to guest on AMD CPUs, because the ARCH_CAP
feature bit and the IA32_ARCH_CAPABILITIES MSR are Intel-specific and
unavailable on AMD systems. I decided to not follow Linux KVM, which
chose to always provide ARCH_CAP and emulate the MSR for AMD CPUs.
* Check whether the host CPU supports the ARCH_CAP feature bit and only
expose it to the guest if the host supports it.
Credit to tuxillo and Claude Opus LLM for the analyses and initial
patches.
ahci - Read DevSleep DETO and MDAT parameters using READ LOG EXT command.
* DETO = DevSleep Exit Timeout in milliseconds
MDAT = Minimum DEVSLP Assertion Time in milliseconds
* In the next step, these parameters will be programmed in the DevSleep
register, when automatic DevSleep power management is enabled, if
available. If we failed to get these values, or if they were read as
zero, we should fall back to the "nominal" values of 20ms for DETO and
10ms for MDAT listed in the Serial ATA specification.
ahci - Improve SATA port link state sysctl output.
This should now always print a meaningful state, instead of outputting
"unknown" in any case where there is no device, no established
communication with the device, or the phy is disabled.
The case "no communication" is relevant, since this can probably occur
in practice on notebooks where an optical drive may be disabled in the BIOS
settings. Or it can probably occur when a disk drive is plugged into a SATA
port, without connecting a power cable.
corecstat(4): Add driver for Intel CPUs' C-State residency counters
* Currently supports Core family CPUs starting at Nehalem series, up to
Coffee Lake, as well as some ATOM CPUs.
<sys/sysctl.h>: Restrict CTL_P1003_1B_MAXID to _KERNEL
This constant is only used to size an array within the kernel.
Obtained-from: FreeBSD (https://reviews.freebsd.org/D25816)
kern: Add KTRACE support to kern_sigtimedwait()
The sigtimedwait()/sigwaitinfo() functions provide the userland program
with a way to explicitly accept signals, so add the KTRACE support
similar to postsig().
Meanwhile, tweak the code style in postsig().
Referred-to: FreeBSD
Reviewed-by: dillon
kern: Clean up kern_sigtimedwait()
- Use tstohz_high() instead of tvtohz_high() and thus remove the
timespec->timeval conversion.
- Use timespecclear() to reset the 'ets' timespec.
- Use bool for 'timevalid' for better clarity.
- Minor style tweaks.
kern: Remove obsolete MPALMOSTSAFE comments from kern_time.c
The MPALMOSTSAFE comments were added when the code was updated to use
get_mplock(), which has been obsolete by using per-process tokens.
Therefore, remove the obsolete MPALMOSTSAFE comments.
In addition, remove the MPSAFE comments as all the functions has been
MP-safe.
sys: Remove obsolete sysctl name list macros
They were only ever intended for use in sysctl(8) and it has not used
them for many years.
See also: FreeBSD
pc64 - First hacky attempt at implementing a deep MWAIT sleep inhibitor.
This is needed by i915(4), to make its DisplayPort code reliable when deep
MWAIT C-States might be used. Since the i915(4) code is not aware of the
CPU core that its interrupt is routed to, we have to pessimistically inhibit
deep MWAIT sleeps on all cores.
This adds a very basic cpu_inhibit_deep_sleep() function to the pc64 platform.
cpu_inhibit_deep_sleep(1) increments a counter by 1, and
cpu_inhibit_deep_sleep(0) decrements that counter. And a positive value in
the counter inhibits deep MWAIT C-States on Intel hardware. Since modern AMD
hardware uses fully autonomous power-saving state selection, and no MWAIT
for idling, this function has no effect on AMD right now.