replace the cas spinlock in kernel mutexes with a "parking" lock.
this is motivated because cas based locks are unfair, meaning that
no effort is made by the algorithm to try and give CPUs access to
the critical section in the order that they tried to acquire them.
cas based locks can also generate a lot of work for the cache
subsystem on a computer because every cpu ends up hammering the
same cacheline.
the combination of these effects for heavily contended mutexes can
get some systems into a situation where they don't make progress,
and are effectively livelocked.
this parking mutex mitigates against these problems.
it's called parking because it was very heavily influnced by what's
described in https://webkit.org/blog/6161/locking-in-webkit/. the
big influence is that the lock itself only has to record it's state,
but the machinery for waiting for the lock is external to the lock.
[82 lines not shown]
PREFIX_ADJOUT_FLAG_DEAD is no longer needed and can be replaced with
a check that the attrs pointer is NULL. Refactor the code now a bit
since the logic got a bit simpler.
OK tb@
Extend ptrace(2) PT_GET_THREAD_* to include thread names.
Use a new define larger then _MAXCOMLEN to avoid that define from
propagating to ptrace.h. Ensure that pts_name is large enough with
a compile time assert.
okay claudio@ jca@
Introduce a bitmap API that scales dynamically up but is also minimal for
the common case.
Functions include:
- set, test, clear: set, test and clear a bit in the map
- empty: check if a bitmap is empty (has no bit set).
- id_get: return the lowest free id in map
- id_put: return an id to the map, aka clear
- init, reset: initialize and free a map
The first 127 elements are put directly into struct bitmap without further
allocation. For maps with more than 127 elements external memory is allocated
in the set function. This memory is only freed by reset which must be called
before an object is removed containing a bitmap.
It is not possible to set bit 0 of a bitmap since that bit is used to
differentiate between access modes. In my use cases this is perfectly fine
since most code already treats 0 in a special way.
OK tb@
Remove unused algorithms from speed.c
Removed unused algorithms (MD2, SEED, RC5) from the algorithm
enum and the `names[]` table.
The current results for these algorithms were always:
md2 0.00 0.00 0.00 0.00 0.00
seed cbc 0.00 0.00 0.00 0.00 0.00
rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
indicating that they are no longer unused.
ok tb@
Convert D_, R_ macro indices to enums in speed.c
Replaced many `#define` based index constants with enums by adding ALGOR_NUM,
DSA_NUM, RSA_NUM, and EC_NUM to the enum definitions.
This makes it easier to add or remove new entries.
ok tb@
speed: remove unused counters and dead parameters
In the speed implementation, a number of unused variables and
parameters (save_count, c[][], rsa_c, dsa_c, ecdsa_c, ecdh_c, and
the num argument of print_message()/pkey_print_message()) were
still left behind.
These values are no longer referenced and cannot affect the
time-based benchmark logic, so remove them.
Functional behaviour of speed remains unchanged.
ok tb@
let tun pretend it's a softnet thread with it's own tun_input_process.
this largely reimplements if_vinput and if_input_process in tun so
packets pushed through the stack from a tun/tap write can operate
largely like they're being processed by a softnet thread.
there's a couple of important differences between tun/tap and softnet
thought. firstly, multiple threads/processes can write to a single
tun/tap descriptor concurrently, so each thread has its own netstack
struct on the stack. secondly, these tun/tap threads are not the
softnet threads, so they can't avoid taking real interface references
when processing requeued packets.
the alternative to this woudl be letting tun/tap writes queue packets
for processing in a softnet thread, but that adds latency and
requires a lot of thought about a backpressure mechanism when a
thread writes too fast for the stack to process.
let if_vinput and if_input_proto requeue packets on a struct netstack.
this moves us from directly calling into different layers of the
network stack to moving the call back up to if_input_process to
dispatch. this reduces the kernel thread stack usage, but also makes
it safe(r) to dispatch this work from an smr critical section. it
also allows us to dispatch work without holding netlock, and
eventually getting if_input_process to amortise the locking over
bundles of these different dispatch calls.