FreeBSD/src 4d3b28b (r347695)sys/amd64/amd64 pmap.c vm_machdep.c, sys/amd64/include proc.h pmap.h

amd64 pmap: rework delayed invalidation, removing global mutex.

For machines having cmpxcgh16b instruction, i.e. everything but very
early Athlons, provide lockless implementation of delayed
invalidation.

The implementation maintains lock-less single-linked list with the
trick from the T.L. Harris article about volatile mark of the elements
being removed. Double-CAS is used to atomically update both link and
generation.  New thread starting DI appends itself to the end of the
queue, setting the generation to the generation of the last element
+1.  On DI finish, thread donates its generation to the previous
element.  The generation of the fake head of the list is the last
passed DI generation.  Basically, the implementation is a queued
spinlock but without spinlock.

Many thanks both to Peter Holm and Mark Johnson for keeping with me
while I produced intermediate versions of the patch.

Reviewed by:    markj
Tested by:      pho
Sponsored by:   The FreeBSD Foundation
MFC after:      1 month
MFC note:       td_md.md_invl_gen should go to the end of struct thread
Differential revision:  https://reviews.freebsd.org/D19630
DeltaFile
+370-16sys/amd64/amd64/pmap.c
+9-2sys/amd64/include/proc.h
+2-1sys/amd64/amd64/vm_machdep.c
+1-1sys/amd64/amd64/trap.c
+2-0sys/amd64/include/pmap.h
+1-1sys/kern/kern_thread.c
+385-216 files

UnifiedSplitRaw