Fix possible reace in ipi_drop_fpstate()
ipi_drop_fpstate() needs to check if the current fpu context matches the
expected one sent via IPI. Only after that %fprs should be cleared.
Use the same asm in ipi_drop_fpstate() as in the start of ipi_save_fpstate().
Also simplify ipi_drop_fpstate() and clearfpstate() since there is no need
to enable the FPU before clearing %fprs.
OK miod@ kettenis@ deraadt@
The pfsync manual page has no mention about safety of this protocol.
Furthermore there are no configuration options for "key negotation",
so we believe everyone knows to run this on a dedicated wire or on L2 inside
some sort of encryption tunnel (it is the natural way to do it in anycase).
Books do mention this detail, because books enjoy being more wordy.
But the AI's can't figure it out, so put in some words to stop future
AI's from sending us slop.
The pfcksum[] field in the pfsync packet header is not a hash of the
packet. It provides absolutely no security benefits, keep reading to
find out.
According to dlg, during early development this field was hopefully
going to be a hash related to the ruleset for optimizing state
match. That approach was abandoned (I guess because ruleset drift
between firewalls happens too often during normal practice). As is
usual in protocol development, at least 6 people were already using
pfsync in production, so for compatibility the field was not
removed... and forgotten. On send, it was left as zero, due to
the full-header zero initialization code.
So there is no useful checksum or hash stored in this field called
'pfcksum[PF_MD5_DIGEST_LENGTH]'. Actually there isn't a single line
of code in the entire tree which writes to this array. Besides the
field definition in the structure, there is 1 comment elsewhere
mentioning the field. So no code at all. I said no code, which is
why there is no code checking it on receive, not even checking if it
[19 lines not shown]
Before it is disabled, unveil allows you to override the settings on
any vnode. A block of #if 0 code suggests this might be different.
That can be deleted. This also shows one word "other" in the manual
page is misleading.
question asked by Stuart Thomas
ok beck
A binary without a PT_LOAD exec segment would later read a pinsyscall table
and damage it strangely. Such a binary cannot actually run, but we should
avoid the internal pinsyscall table damage, and fail the execve with EINVAL.
reported by Stuart Thomas
ok guenther
vmm: Handle reserved bits in debug registers
vmm(4) handles the %dr6 debug register on VMX on its own. It is not
part of the VMCB. The AMD and Intel SDMs mention that a 'MOV DRn'
instruction traps with #GP when any of the upper 32 bits of %dr6/%dr7
is 1. Userland can set arbitrary values in that register, forcing an
Intel machine to crash. An initial bogus %dr7 fails to launch the VM
on both platforms.
Reject such debug register values an all platforms.
ok mlarkin@
Reported-by: syzbot+f386e2f64711877025a6 at syzkaller.appspotmail.com
Call repo_check_timeout() before colleting the POLLOUT fds. Since
repo_abort() called by repo_check_timeout() will add messages to
be sent out.
This brings back rev 1.263 which was accidentially reverted by rev 1.293
OK tb@
When the pagedaemon is triggered to create free memory, there may be
sleeping pmemrange allocations with multi-page alignment requirements
which can't be satisfied by the simplistic freeing of (solo) pages
which the pagedaemon performs. As we near starvation, fragmentation
is the main problem. Our free list could be large enough that the
pagedaemon sees no reason to do more work, but also too fragmented to
satisfy a pending allocation request with complex requirements
(imagine asking for 512K of physically linear memory which is DMA
reachable). When the requirement isn't satisfied, the pagedaemon is
told to try again, but again doesn't mean harder because it has no
mechanism to try harder. It's tracking variables do not show the
fragmentation problem. It spins a lot. Often this becomes a
deadlock.
Time to change strategy: Overshoot creation of (both) inactive and
free pages each time through the loop. After inspecting existing
variables, we generate minumum 128 inactive pages (which may be
dynamically drawn down asyncronously by accesses), and then try to
convert minumum 128 inactives into free pages (different pages
get freed different ways, including via swapcluster which has been
[7 lines not shown]
To support swapencrypt, the swapcluster code has a memory allocation codepath.
Since this is runs inside the pagedaemon that is unworkable. We'd like to
encrypt the pages inplace for IO, but there are architectures not ready for
a high-mem page to be written to a dma-restricted device (work in progress).
So for now we need to bounce through dma-reachable memory buffer. A previous
attempt had 1 extra bounce buffer, but then slept on allocation inside the
pagedaemon context which is also unworkable. This version contains 32
pre-allocated swapclusters (64K each), and through a counter signals to the
pagedaemon when it should stop trying to create memory. 32 swap clusters
is comfortably more than the minimum we expect the pagedaemon frantically
generate. This crummy solution is good enough until we the dma reach problem
is solved (soon)
ok kettenis kirill (who looked into other solutions) beck
Apparently we shouldn't touch the RTC immediately after restarting the
i8254 clock either when coming out of S3 suspend. So move the code
that checks whether the RTC alarm went off and clears it all the way to
the end of acpi_cpu_resume. This fixes a lockup seen on the x220.
Figured out by mlarkin@ who write the initial diff; I just tweaked it.
ok mlarkin@, deraadt@
sys/vfs_biomem: add missed atop() in buf_alloc_pages()
bufbackoff() operates in pages, but size at this call site was a byte
count; the old loop therefore asked for far too much backoff and
compared reclaimed pages against bytes.
On a low memory machine that made the NOWAIT retry path much less likely
to succeed, so the code dropped into the WAITOK allocation below and
slept.
Using atop() puts the units back in line; backoff can now satisfy the
intended request, and the subsequent NOWAIT retry again has a realistic
chance of success. The WAITOK path remains possible, but it should be
reached less often.
OK deraadt@, beck@
At the end of parsing the http response header do some sanity checks
to ensure that the response includes all needed data.
Right now only the presence of a Location header is checked if a HTTP
redirect was returned (e.g. a 301 status).
Different fix for a report from Daniel Anderson
OK tb@
In powerpc stacktrace_save(), start at correct return address
I got an empty trace. It was reading garbage as the 1st return
address and might have accidentally taken the "if (lr & 3) break;".
By using __builtin_return_address(0) and pointing to the correct
frame, I get a trace where #0 is the function calling
stacktrace_save().
fix how source and state limiters are wired into rbtrees inside pfctl.
i messed up when we added support for names on these things. the
id and names are each supposed to be unique, which is checked by
putting the one limiter into an rb tree based on their id and another
based on their name. unfortunately i used the same RBT_ENTRY fields
for both trees, which meant using both trees on the same limiter
corrupted the topology, which goes badly when you want to use
multiple limiters.
found by, tested, and ok dgl@ (who is not me, this is not a typo)
ok jmatthew@