sched_ule: 32-bit platforms: Fix runq_print() after runq changes
The compiler would report a mismatch between the format and the actual
type of the runqueue status word because the latter is now
unconditionally defined as an 'unsigned long' (which has the "natural"
platform size) and the format expects a 'size_t', which expands to an
'unsigned int' on 32-bit platforms (although they are both of the same
actual size).
This worked before as the C type used depended on the architecture and
was set to 'uint32_t' aka 'unsigned int' on these 32-bit platforms.
Just fix the format (use 'l'). While here, remove outputting '0x' by
hand, instead relying on '#' (only difference is for 0, and is fine).
runq_print() should be moved out of 'sched_ule.c' in a subsequent
commit.
Reported by: Jenkins
[4 lines not shown]
lualoader: adapt builtin brand/logo definitions as well
While these should be moved to the new format, it wasn't my intention
to force them over immediately. Downstreams may embed their own brands
in drawer.lua, and we shouldn't break them for something like this.
Move adapt_fb_shim() up and use it for preloaded definitions to avoid
forcing the matter for now. Perhaps in the future we'll start writing
out warnings for those that do need adapted.
Reported by: 0x1eef on IRC
Fix a warning in the rack stack.
There is an initialization warning where error may not be set when logging
extended BBlogs. Lets fix this so error is init'd to zero so we won't have
a warning.
ps(1), top(1): Priority: Let 0 be the first timesharing level
Change the origin from PZERO to PUSER.
Doing so allows users to immediately detect if some thread is running
under a high priority (kernel or realtime) or under a low one
(timesharing or idle).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
sched_4bsd: ESTCPULIM(): Allow any value in the timeshare range
The current formula wastes queues and degrades usage estimation
precision, since any increase of ticks that goes over 40 priorities (so,
8 * 40) is clamped to the last of these 40 levels (the nice value is
subsequently added to that number to get the final priority level).
Allow 'ts_estcpu' to grow up to a value corresponding to the greatest
(i.e., lowest) priority of the timeshare range.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45392
sched_4bsd: Remove RQ_PPQ from ESTCPULIM()'s formula
Substracting RQ_PPQ to the maximum number of allowed priority values
(the factor to INVERSE_ESTCPU_WEIGHT) has the effect of pessimizing the
number of processes assigned to the last priority bucket.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45392
sched_4bsd: Move ESTCPULIM() after its macro dependencies
No functional change (intended).
Also makes the comment about INVERSE_ESTCPU_WEIGHT() adjacent to its
definition.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45392
sched_ule: Sanitize CPU's use and priority computations, and ticks storage
Computation of %CPU in sched_pctcpu() was overly complicated, wrong in
the case of a non-maximal window (10 seconds span; this is always the
case in practice as the window would oscillate between 10 and 11 seconds
for continuously running processes) and performed unshifted for the
first part, essentially losing precision (up to 9% for SCHED_TICK_SECS
being 10), and with some uneffective shift for the second part.
Conserve maximum precision by only shifting by the require amount to
attain FSHIFT before dividing. Apply classical rounding to nearest
instead of rounding down.
To generally avoid wraparound problems with tick fields in 'struct
td_sched' (as already happened once in sched_pctcpu_update()), make then
all unsigned, and ensure 'ticks' is always converted to some 'u_int'.
While here, fix SCHED_AFFINITY().
Rewrite sched_pctcpu_update() while keeping the existing formulas:
- Fix the hole in the cliff case that in theory 'ts_ticks' can become
[34 lines not shown]
sched_ule: Recover previous nice and anti-starvation behaviors
Justification for this change is to avoid disturbing ULE's behavior too
much at this time. We however acknowledge that the effect of "nice"
values is extremely weak and will most probably change it going forward.
Tuning allows to mostly recover ULE's behavior prior to the switch to
a single 256-queue runqueue and the increase of the timesharing priority
levels' range.
After this change, in a series of test involving two long-running
processes with varying nice values competing for the same CPU, we
observe that used CPU time ratios of the highest priority process to
change by at most 1.15% and on average by 0.46% (absolute differences).
In relative differences, they change by at most 2% and on average by
0.78%.
In order to preserve these ratios, as the number of priority levels
alloted to timesharing have been raised from 136 to 168 (and the subsets
[24 lines not shown]
sched: Internal priority ranges: Reduce kernel, increase timeshare
Now that a difference of 1 in priority level is significant, we can
shrink the priority range reserved for kernel threads.
Only four distinct levels are necessary for the bottom half (3 base
levels and arguably an additional one for demoted interrupt threads that
run for full time slices so that they finally don't compete with other
ones). To leave room for other possible uses, we settle on 8 levels.
Given the symbolic constants for the top half, 10 levels are currently
necessary. We settle on 16 levels.
This allows to enlarge the timesharing range, which covers ULE's both
interactive and batch range, to 168 distinct levels from less than 64
ones for ULE (as of before the changes to make it use a single runqueue
and have 256 distinct levels per runqueue) and 34 ones for 4BSD.
While here, note that the realtime range is required to have at least 32
[17 lines not shown]
epoch_test: Assign different priorities using offset 1
Replace the hardcoded 4 (old RQ_PPQ) by 1 (new RQ_PPQ), as all priority
levels are now treated differently.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
runq: Remove userland references to RQ_PPQ in rtprio contexts
Concerns only a single test (ptrace_test.c).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45390
runq: Bump __FreeBSD_version after switching to 256 levels
Corresponding to changing RQ_PPQ to 1.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45390
runq: Switch to 256 levels
This increases the number of levels from 64 to 256, which coincides with
the distinct internal priority values (priority is currently encoded in
a 'u_char', whose range is entirely used).
With this change, we become POSIX-compliant for SCHED_FIFO/SCHED_RR in
that we really provide 32 distinct priority levels for these policies.
Previously, threads in the same "priority group", with priority groups
defined as the threads in consecutive spans of 4 priority levels
starting with level 0 up to 31 (so there are 8 groups), could not
preempt or be preempted by each other even if they were assigned
different priority levels.
See also commit "sched_ule: Use a single runqueue per CPU" for all the
drawbacks that this change also removes.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
[2 lines not shown]
zfs: spa: ZIO_TASKQ_ISSUE: Use symbolic priority
This allows to change the meaning of priority differences in FreeBSD
without requiring code changes in ZFS.
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45390
Internal scheduling priorities: Always use symbolic ones
Replace priorities specified by a base priority and some hardcoded
offset value by symbolic constants. Hardcoded offsets prevent changing
the difference between priorities without changing their relative
ordering, and is generally a dangerous practice since the resulting
priority may inadvertently belong to a different selection policy's
range.
Since RQ_PPQ is 4, differences of less than 4 are insignificant, so just
remove them. These small differences have not been changed for years,
so it is likely they have no real meaning (besides having no practical
effect). One can still consult the changes history to recover them if
ever needed.
No functional change (intended).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
[2 lines not shown]
sched_ule: Use a single runqueue per CPU
Previously, ULE would use 3 separate runqueues per CPU to store threads,
one for each of its selection policies, which are realtime, timesharing
and idle. They would be examined in this order, and the first thread
found would be the one selected.
This choice indeed appears as the easiest evolution from the single
runqueue used by sched_4bsd (4BSD): It allows sharing most of the same
runqueue code, which currently defines 64 levels per runqueue, while
multiplying the number of levels (by 3). However, it has several
important drawbacks:
1. The number of levels is the same for each selection policy. 64 is
unnecessarily large for the idle policy (only 32 distinct levels would
be necessary, given the 32 levels of our RTP_PRIO_IDLE and their future
aliases in the to-be-introduced SCHED_IDLE POSIX scheduling policy) and
unnecessary restrictive both for the realtime policy (which should
include 32 distinct levels for PRI_REALTIME, given our implementation of
[34 lines not shown]
sched_ule: Re-implement stealing on top of runq common-code
Stop using internal knowledge of runqueues. Remove duplicate
boilerplate parts.
Concretely, runq_steal() and runq_steal_from() are now implemented on
top of runq_findq().
Besides considerably simplifying the code, this change also brings an
algorithmic improvement since, previously, set bits in the runqueue's
status words were found by testing each bit individually in a loop
instead of using ffsl()/bsfl() (except for the first set bit per status
word).
This change also makes it more apparent that runq_steal_from() treats
the first thread with highest priority specifically (which runq_steal()
does not).
MFC after: 1 month
[3 lines not shown]
sched_ule: runq_steal_from(): Suppress first thread special case
This special case was introduced as soon as commit "ULE 3.0"
(ae7a6b38d53f, r171482, from July 2007). It caused runq_steal_from() to
ignore the highest-priority thread while stealing.
Its functionality was changed in commit "Rework CPU load balancing in
SCHED_ULE" (36acfc6507aa, r232207, from February 2012), where the intent
was to keep track of that first thread and return it if no other one was
stealable, instead of returning NULL (no steal). Some bug prevented it
from working in loaded cases (more than one thread, and all threads but
the first one not stealable), which was subsequently fixed in commit
"sched_ule(4): Fix interactive threads stealing." (bd84094a51c4, from
September 2021).
All the reasons for this mechanism we could second-guess were dubious at
best. Jeff Roberson, ULE's main author, says in the differential
revision that "The point was to move threads that are least likely to
benefit from affinity because they are unlikely to run soon enough to
[13 lines not shown]
runq: Tidy up and rename runq_setbit() and runq_clrbit()
Factorize common sub-expressions in a separate helper (runq_sw_apply())
for better readability.
Rename these functions so that the names refer to the use cases rather
than the implementations.
Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387
runq: New function runq_is_queue_empty(); Use it in ULE
Indicates if some particular queue of the runqueue is empty.
Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387
runq: New runq_findq(), common low-level search implementation
That new runq_findq(), based on the implementation of the former
runq_findq_range(), is intended to become the foundation and unique
low-level implementation for all searches in a runqueue. In addition to
a range of queues' indices, it takes a predicate function, allowing to:
- Possibly skip a non-empty queue with higher priority (numerically
lower index) on some criteria. This is not yet used but will be in
a subsequent commit revising ULE's stealing machinery.
- Choose a specific thread in the queue, not necessarily the first.
- Return whatever information is deemed necessary.
It helps to remove duplicated boilerplate code, including redundant
assertions, and generally makes things much clearer. These effects will
be even greater in a subsequent commit modifying ULE to use it.
runq_first_thread_range() replaces the old runq_findq_range() (returns
the first thread of the highest priority queue in the requested range),
and runq_first_thread() the old runq_findq() (same, but considering all
[7 lines not shown]
runq: Revamp runq_find*(), new runq_find_range()
Rename existing functions to use the simpler prefix 'runq_findq' instead
of 'runq_findbit' (that they work on top of bit runs is an
implementation detail).
Add runq_findq_range(), which takes a range of indices to operate on
(bounds included). This is in preparation for changing ULE to use
a single runqueue, since it needs to treat the timesharing range
differently.
Rename runq_findbit_from() to runq_findq_circular(), which is more
descriptive.
To reduce code duplication, have runq_findq() and runq_findq_circular()
leverage runq_findq_range() internally. For the latter, this also
brings a small algorithmic improvement, since previously the second pass
(from queue 0) would cover the whole runqueue if it was completely
empty, scanning again empty queues after the start index.
[6 lines not shown]
runq: Re-order functions more logically
No code change in moved functions.
Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387