FreeBSD/src 9d7fb76sys/kern uipc_debug.c uipc_sockbuf.c, sys/netsmb smb_trantcp.c

sockets: garbage collect SB_NOINTR

Not used.  All socket buffer sleeps are interruptible.
DeltaFile
+0-6sys/netsmb/smb_trantcp.c
+0-4sys/kern/uipc_debug.c
+1-2sys/kern/uipc_sockbuf.c
+1-1sys/sys/sockbuf.h
+2-134 files

FreeBSD/src 93ff7dblib/libc/sys getsockopt.2, sys/kern uipc_socket.c uipc_sockbuf.c

socket: Implement SO_SPLICE

This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket.  This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.

The interface is copied from OpenBSD, and this implementation aims to be
compatible.  Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket.  In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets.  More concretely, when
setting the option one passes the following struct:

    struct splice {
            int fd;
            off_t max;

    [41 lines not shown]
DeltaFile
+698-4sys/kern/uipc_socket.c
+61-1lib/libc/sys/getsockopt.2
+47-2sys/sys/socketvar.h
+32-2sys/kern/uipc_sockbuf.c
+12-0sys/sys/socket.h
+2-1sys/sys/sockbuf.h
+852-101 files not shown
+854-107 files

FreeBSD/src a1da7dclib/libsys getsockopt.2, sys/kern uipc_socket.c uipc_sockbuf.c

socket: Implement SO_SPLICE

This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket.  This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.

The interface is copied from OpenBSD, and this implementation aims to be
compatible.  Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket.  In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets.  More concretely, when
setting the option one passes the following struct:

    struct splice {
            int fd;
            off_t max;

    [39 lines not shown]
DeltaFile
+698-4sys/kern/uipc_socket.c
+61-1lib/libsys/getsockopt.2
+47-2sys/sys/socketvar.h
+32-2sys/kern/uipc_sockbuf.c
+12-0sys/sys/socket.h
+2-1sys/sys/sockbuf.h
+852-101 files not shown
+854-107 files

FreeBSD/src b14a491sys/kern uipc_ktls.c, sys/sys sockbuf.h

ktls: Fix races that can lead to double initialization

ktls_enable_rx() and ktls_enable_tx() have checks to return EALREADY if
the socket already has KTLS enabled.  However, these are done without
any locks held and nothing blocks concurrent attempts to set the socket
option.  I believe the worst outcome of the race is leaked memory.

Fix the problem by rechecking under the sockbuf lock.  While here, unify
the locking protocol for sb_tls_info: require both the sockbuf and
socket I/O locks in order to enable KTLS.  This means that either lock
is sufficient for checking whether KTLS is enabled in a given sockbuf,
which simplifies some refactoring further down the road.

Note that the SOLISTENING() check can go away because
SOCK_IO_RECV_LOCK() atomically locks the socket buffer and checks
whether the socket is a listening socket.  This changes the returned
errno value, so update a test which checks it.

Reviewed by:    gallatin

    [6 lines not shown]
DeltaFile
+21-2sys/kern/uipc_ktls.c
+2-1sys/sys/sockbuf.h
+1-1tests/sys/kern/ktls_test.c
+24-43 files

FreeBSD/src 163cdf6sys/kern uipc_ktls.c, sys/sys sockbuf.h

ktls: Fix races that can lead to double initialization

ktls_enable_rx() and ktls_enable_tx() have checks to return EALREADY if
the socket already has KTLS enabled.  However, these are done without
any locks held and nothing blocks concurrent attempts to set the socket
option.  I believe the worst outcome of the race is leaked memory.

Fix the problem by rechecking under the sockbuf lock.  While here, unify
the locking protocol for sb_tls_info: require both the sockbuf and
socket I/O locks in order to enable KTLS.  This means that either lock
is sufficient for checking whether KTLS is enabled in a given sockbuf,
which simplifies some refactoring further down the road.

Note that the SOLISTENING() check can go away because
SOCK_IO_RECV_LOCK() atomically locks the socket buffer and checks
whether the socket is a listening socket.  This changes the returned
errno value, so update a test which checks it.

Reviewed by:    gallatin

    [4 lines not shown]
DeltaFile
+21-2sys/kern/uipc_ktls.c
+2-1sys/sys/sockbuf.h
+1-1tests/sys/kern/ktls_test.c
+24-43 files

FreeBSD/src 5716d90sys/kern uipc_usrreq.c, sys/sys sockbuf.h

Revert "unix: new implementation of unix/stream & unix/seqpacket"

The regressions in aio(4) and kernel RPC aren't a 5 minute problem.

This reverts commit d80a97def9a1db6f07f5d2e68f7ad62b27918947.
This reverts commit d1cbb17a873c787a527316bbb27551e97d5ad30c.
This reverts commit fb8a8333b481cc4256d0b3f0b5b4feaa4594e01f.
DeltaFile
+333-652sys/kern/uipc_usrreq.c
+0-7sys/sys/sockbuf.h
+333-6592 files

FreeBSD/src d80a97dsys/kern uipc_usrreq.c, sys/sys sockbuf.h

unix: new implementation of unix/stream & unix/seqpacket

Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX
SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension
of SOCK_STREAM.  The change meets three goals: get rid of unix(4) specific
stuff in the generic socket code, provide a faster and robust unix/stream
sockets and bring unix/seqpacket much closer to specification.  Highlights
follow:

- The send buffer now is truly bypassed.  Previously it was always empty,
but the send(2) still needed to acquire its lock and do a variety of
tricks to be woken up in the right time while sleeping on it.  Now the
only two things we care about in the send buffer is the I/O sx(9) lock
that serializes operations and value of so_snd.sb_hiwat, which we can read
without obtaining a lock.  The sleep of a send(2) happens on the mutex of
the receive buffer of the peer.  A bulk send/recv of data with large
socket buffers will make both syscalls just bounce between owning the
receive buffer lock and copyin(9)/copyout(9), no other locks would be
involved.

    [19 lines not shown]
DeltaFile
+645-325sys/kern/uipc_usrreq.c
+7-0sys/sys/sockbuf.h
+652-3252 files

FreeBSD/src 660bd40sys/netlink netlink_io.c netlink_domain.c, sys/sys sockbuf.h

netlink: use domain specific send buffer

Instead of using generic socket code, create Netlink specific socket
buffer.  It is a simple TAILQ of writes that came from userland.  This
saves us one memory allocation that could fail and one memory copy.

Reviewed by:            melifaro
Differential Revision:  https://reviews.freebsd.org/D42522
DeltaFile
+37-102sys/netlink/netlink_io.c
+77-25sys/netlink/netlink_domain.c
+9-3sys/netlink/netlink_var.h
+6-0sys/sys/sockbuf.h
+129-1304 files

FreeBSD/src 29363fbsys/libkern bcopy.c, sys/netpfil/ipfilter/netinet ip_frag.c

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:           Netflix
DeltaFile
+0-6sys/rpc/rpcb_clnt.c
+0-6sys/libkern/bcopy.c
+0-5sys/rpc/clnt_vc.c
+0-5sys/rpc/clnt_bck.c
+0-5sys/rpc/rpcb_prot.c
+0-4sys/netpfil/ipfilter/netinet/ip_frag.c
+0-31622 files not shown
+0-1,332628 files

FreeBSD/src f8167e0sys/dev/mpr mpr_ioctl.h, sys/dev/mps mps_ioctl.h

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/

Similar commit in current:
(cherry picked from commit 95ee2897e98f)
DeltaFile
+0-4sys/dev/mpr/mpr_ioctl.h
+0-4sys/dev/mps/mps_ioctl.h
+0-2sys/sys/umtxvar.h
+0-2sys/sys/uuid.h
+0-2sys/sys/vdso.h
+0-2sys/sys/watchdog.h
+0-163,795 files not shown
+0-7,6063,801 files

FreeBSD/src 95ee289sys/amd64/vmm vmm_util.h x86.c, sys/arm/allwinner a10_codec.c

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
DeltaFile
+0-4sys/dev/mps/mps_ioctl.h
+0-4sys/dev/mpr/mpr_ioctl.h
+0-2sys/amd64/vmm/vmm_util.h
+0-2sys/amd64/vmm/x86.c
+0-2sys/amd64/vmm/x86.h
+0-2sys/arm/allwinner/a10_codec.c
+0-163,592 files not shown
+0-7,2003,598 files

FreeBSD/src 2631a25sys/kern uipc_sockbuf.c, sys/sys sockbuf.h

sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.

Protocols such as netlink may need a large socket receive buffer,
 measured in tens of megabytes. This change allows netlink to
 set larger socket buffers (given the privs are in place), without
 requiring user to manuall bump maxsockbuf.

Reviewed by:    glebius
Differential Revision: https://reviews.freebsd.org/D36747

(cherry picked from commit 7b660faa9e30c15d3be9b2c44c3ca046a33331f4)
DeltaFile
+15-7sys/kern/uipc_sockbuf.c
+2-0sys/sys/sockbuf.h
+17-72 files

FreeBSD/src c0e4090sys/kern uipc_ktls.c, sys/netinet tcp_ratelimit.c tcp_var.h

ktls: Accurately track if ifnet ktls is enabled

This allows us to avoid spurious calls to ktls_disable_ifnet()

When we implemented ifnet kTLSe, we set a flag in the tx socket
buffer (SB_TLS_IFNET) to indicate ifnet kTLS.  This flag meant that
now, or in the past, ifnet ktls was active on a socket.  Later,
I added code to switch ifnet ktls sessions to software in the case
of lossy TCP connections that have a high retransmit rate.
Because TCP was using SB_TLS_IFNET to know if it needed to do math
to calculate the retransmit ratio and potentially call into
ktls_disable_ifnet(), it was doing unneeded work long after
a session was moved to software.

This patch carefully tracks whether or not ifnet ktls is still enabled
on a TCP connection.  Because the inp is now embedded in the tcpcb, and
because TCP is the most frequent accessor of this state, it made sense to
move this from the socket buffer flags to the tcpcb. Because we now need
reliable access to the tcbcb, we take a ref on the inp when creating a tx

    [12 lines not shown]
DeltaFile
+114-31sys/kern/uipc_ktls.c
+2-12sys/netinet/tcp_stacks/rack.c
+2-2sys/netinet/tcp_ratelimit.c
+2-1sys/sys/ktls.h
+3-0sys/netinet/tcp_var.h
+1-1sys/netinet/tcp_stacks/bbr.c
+124-472 files not shown
+126-498 files

FreeBSD/src f669685sys/kern uipc_sockbuf.c uipc_socket.c, sys/sys sockbuf.h protosw.h

protocols: make socket buffers ioctl handler changeable

Allow to set custom per-protocol handlers for the socket buffers
 ioctls by introducing pr_setsbopt callback with the default value
 set to the currently-used sbsetopt().

Reviewed by:    glebius
Differential Revision: https://reviews.freebsd.org/D36746
DeltaFile
+18-6sys/kern/uipc_sockbuf.c
+1-13sys/kern/uipc_socket.c
+2-1sys/sys/sockbuf.h
+2-0sys/sys/protosw.h
+1-0sys/kern/uipc_domain.c
+24-205 files

FreeBSD/src 7b660fasys/kern uipc_sockbuf.c, sys/sys sockbuf.h

sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.

Protocols such as netlink may need a large socket receive buffer,
 measured in tens of megabytes. This change allows netlink to
 set larger socket buffers (given the privs are in place), without
 requiring user to manuall bump maxsockbuf.

Reviewed by:    glebius
Differential Revision: https://reviews.freebsd.org/D36747
DeltaFile
+15-7sys/kern/uipc_sockbuf.c
+2-0sys/sys/sockbuf.h
+17-72 files

FreeBSD/src 458f475share/man/man4 unix.4, sys/kern uipc_usrreq.c

unix/dgram: smart socket buffers for one-to-many sockets

A one-to-many unix/dgram socket is a socket that has been bound
with bind(2) and can get multiple connections.  A typical example
is /var/run/log bound by syslogd(8) and receiving multiple
connections from libc syslog(3) API.  Until now all of these
connections shared the same receive socket buffer of the bound
socket.  This made the socket vulnerable to overflow attack.
See 240d5a9b1ce for a historical attempt to workaround the problem.

This commit creates a per-connection socket buffer for every single
connected socket and eliminates the problem.  The new behavior will
optimize seldom writers over frequent writers.  See added test case
scenarios and code comments for more detailed description of the
new behavior.

Reviewed by:            markj
Differential revision:  https://reviews.freebsd.org/D35303
DeltaFile
+182-23sys/kern/uipc_usrreq.c
+73-16tests/sys/kern/unix_dgram.c
+59-2share/man/man4/unix.4
+26-1sys/sys/sockbuf.h
+340-424 files

FreeBSD/src a7444f8sys/kern uipc_usrreq.c, sys/sys sockbuf.h

unix/dgram: use minimal possible socket buffer for PF_UNIX/SOCK_DGRAM

This change fully splits away PF_UNIX/SOCK_DGRAM from other socket
buffer implementations, without any behavior changes.

Generic socket implementation is reduced down to one STAILQ and very
little code.

Reviewed by:            markj
Differential revision:  https://reviews.freebsd.org/D35300
DeltaFile
+107-95sys/kern/uipc_usrreq.c
+8-0sys/sys/sockbuf.h
+115-952 files

FreeBSD/src a4fc414sys/kern uipc_socket.c, sys/sys sockbuf.h protosw.h

sockets: enable protocol specific socket buffers

Split struct sockbuf into common shared fields and protocol specific
union, where protocols are free to implement whatever buffer they
want.  Such protocols should mark themselves with PR_SOCKBUF and are
expected to initialize their buffers in their pr_attach and tear
them down in pr_detach.

Reviewed by:            markj
Differential revision:  https://reviews.freebsd.org/D35299
DeltaFile
+55-31sys/sys/sockbuf.h
+9-3sys/kern/uipc_socket.c
+3-0sys/sys/protosw.h
+67-343 files

FreeBSD/src fe8c78fsys/kern uipc_ktls.c uipc_sockbuf.c, sys/sys ktls.h sockbuf.h

ktls: Add full support for TLS RX offloading via network interface.

Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet
header to figure out if an incoming mbuf has been fully offloaded or
not. This information follows the packet stream via the LRO engine, IP
stack and finally to the TCP stack. The TCP stack preserves the mbuf
packet header also when re-assembling packets after packet loss. When
the mbuf goes into the socket buffer the packet header is demoted and
the offload information is transferred to "m_flags" . Later on a
worker thread will analyze the mbuf flags and decide if the mbufs
making up a TLS record indicate a fully-, partially- or not decrypted
TLS record. Based on these three cases the worker thread will either
pass the packet on as-is or recrypt the decrypted bits, if any, or
decrypt the packet as usual.

During packet loss the kernel TLS code will call back into the network
driver using the send tag, informing about the TCP starting sequence
number of every TLS record that is not fully decrypted by the network
interface. The network interface then stores this information in a

    [21 lines not shown]
DeltaFile
+393-32sys/kern/uipc_ktls.c
+36-5sys/kern/uipc_sockbuf.c
+14-1sys/sys/ktls.h
+1-0sys/sys/sockbuf.h
+444-384 files

FreeBSD/src d59bc18sys/kern uipc_sockbuf.c uipc_debug.c, sys/sys socketvar.h sockbuf.h

sockbuf: remove unused mbuf counter and cluster counter

With M_EXTPG mbufs these two counters already do not represent the
reality.  As we are moving towards protocol independent socket buffers,
which may not even use mbufs at all, the counters become less and less
relevant.  The only userland seeing them was 'netstat -x'.

PR:                     264181 (exp-run)
Reviewed by:            markj
Differential revision:  https://reviews.freebsd.org/D35334
DeltaFile
+6-25sys/kern/uipc_sockbuf.c
+3-10usr.bin/netstat/inet.c
+2-7usr.bin/netstat/netstat.1
+0-2sys/sys/socketvar.h
+0-2sys/sys/sockbuf.h
+0-2sys/kern/uipc_debug.c
+11-486 files

FreeBSD/src 6890b58sys/kern uipc_sockbuf.c, sys/netinet6 ip6_input.c

sockbuf: improve sbcreatecontrol()

o Constify memory pointer.  Make length unsigned.
o Make it never fail with M_WAITOK and assert that length is sane.
DeltaFile
+15-9sys/kern/uipc_sockbuf.c
+2-2sys/netinet6/ip6_input.c
+2-1sys/sys/sockbuf.h
+19-123 files

FreeBSD/src b46667csys/kern uipc_usrreq.c uipc_sockbuf.c, sys/netgraph/bluetooth/socket ng_btsocket_hci_raw.c

sockbuf: merge two versions of sbcreatecontrol() into one

No functional change.
DeltaFile
+32-33sys/netinet6/ip6_input.c
+28-28sys/netinet/ip_input.c
+9-9sys/kern/uipc_usrreq.c
+1-8sys/kern/uipc_sockbuf.c
+4-4sys/netgraph/bluetooth/socket/ng_btsocket_hci_raw.c
+3-2sys/netinet/udp_usrreq.c
+77-844 files not shown
+83-9210 files

FreeBSD/src 4328318sys/kern uipc_sockbuf.c uipc_socket.c, sys/netinet tcp_input.c

sockets: use socket buffer mutexes in struct socket directly

Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it.  In 74a68313b50 macros that access
this mutex directly were added.  Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally.  However, it allows operation of a
protocol that doesn't use them.

Reviewed by:            markj
Differential revision:  https://reviews.freebsd.org/D35152
DeltaFile
+91-51sys/kern/uipc_sockbuf.c
+29-32sys/sys/socketvar.h
+24-24sys/kern/uipc_socket.c
+18-13sys/kern/sys_socket.c
+8-8sys/sys/sockbuf.h
+4-4sys/netinet/tcp_input.c
+174-13211 files not shown
+191-14917 files

FreeBSD/src a982ce0sys/kern uipc_socket.c uipc_usrreq.c, sys/sys socketvar.h sockbuf.h

sockets: remove the socket-on-stack hack from sorflush()

The hack can be tracked down to 4.4BSD, where copy was performed
under splimp() and then after splx() dom_dispose was called.
Stevens has a chapter on this function, but he doesn't answer why
this trick is necessary.  Why can't we call into dom_dispose under
splimp()?  Anyway, with multithreaded kernel the hack seems to be
necessary to avoid LORs between socket buffer lock and different
filesystem locks, especially network file systems.

The new socket buffers KPI sbcut() from 1d2df300e9b allow us to get
rid of the hack.

Reviewed by:            markj
Differential revision:  https://reviews.freebsd.org/D35125
DeltaFile
+5-28sys/kern/uipc_socket.c
+17-1sys/kern/uipc_usrreq.c
+2-0sys/sys/socketvar.h
+0-2sys/sys/sockbuf.h
+24-314 files

FreeBSD/src 7db5444sys/kern uipc_sockbuf.c, sys/sys sockbuf.h

sockbufs: make sbrelease_internal() private
DeltaFile
+1-1sys/kern/uipc_sockbuf.c
+0-1sys/sys/sockbuf.h
+1-22 files

FreeBSD/src f983298sys/dev/cxgbe/tom t4_cpl_io.c, sys/dev/hyperv/hvsock hv_sock.c

socket: Rename sb(un)lock() and interlock with listen(2)

In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:    tuexen, gallatin
Sponsored by:   The FreeBSD Foundation

    [2 lines not shown]
DeltaFile
+45-10sys/kern/uipc_socket.c
+13-18sys/dev/hyperv/hvsock/hv_sock.c
+0-28sys/kern/uipc_sockbuf.c
+9-10sys/netinet/sctputil.c
+9-9sys/dev/cxgbe/tom/t4_cpl_io.c
+13-1sys/sys/socketvar.h
+89-764 files not shown
+104-9310 files

FreeBSD/src ade1daasys/kern uipc_socket.c, sys/sys sockbuf.h

socket: Synchronize soshutdown() with listen(2) and AIO

To handle shutdown(SHUT_RD) we flush the receive buffer of the socket.
This may involve searching for control messages of type SCM_RIGHTS,
since we need to close the file references.  Closing arbitrary files
with socket buffer locks held is undesirable, mainly due to lock
ordering issues, so we instead make a copy of the socket buffer and
operate on that without any locks.  Fields in the original buffer are
cleared.

This behaviour clobbered the AIO job queue associated with a receive
buffer.  It could also cause us to leak a KTLS session reference.
Reorder socket buffer fields to address this.

An alternate solution would be to remove the hack in sorflush(), but
this is not quite feasible (yet).  In particular, though sorflush()
flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to
be queued - the flag just prevents userspace from reading more data.  I
suspect we should fix this; SBS_CANTRCVMORE represents a terminal state

    [10 lines not shown]
DeltaFile
+33-23sys/kern/uipc_socket.c
+3-2sys/sys/sockbuf.h
+36-252 files

FreeBSD/src 74a6831sys/sys socketvar.h sockbuf.h

socket: Add macros to lock socket buffers using socket references

Since commit c67f3b8b78e50c6df7c057d6cf108e4d6b4312d0 the sockbuf
mutexes belong to the containing socket.  Sockbufs contain a pointer to
a mutex, which by default is initialized to the corresponding mutexes in
the socket.  The SOCKBUF_LOCK() etc. macros operate on this pointer.
However, the pointer is clobbered by listen(2) so it's not safe to use
them unless one is sure that the socket is not a listening socket.

This change introduces a new set of macros which lock socket buffers
through the socket.  This is a bit cheaper since it removes the pointer
indirection, and allows one to safely lock socket buffers and then check
for a listening socket.

For MFC, these macros should be reimplemented in terms of the existing
socket buffer layout.

Reviewed by:    tuexen, gallatin, jhb
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D31900
DeltaFile
+28-2sys/sys/socketvar.h
+5-2sys/sys/sockbuf.h
+33-42 files

FreeBSD/src c67f3b8sys/sys socketvar.h sockbuf.h

socket: Move sockbuf mutexes into the owning socket

This is necessary to provide proper interlocking with listen(2), which
destroys the socket buffers.  Otherwise, code must lock the socket
itself and check SOLISTENING(so), but most I/O paths do not otherwise
need to acquire the socket lock, so the extra overhead needed to check a
rare error case is undesirable.

listen(2) calls are relatively rare.  Thus, the strategy is to have it
acquire all socket buffer locks when transitioning to a listening
socket.  To do this safely, these locks must be stable, and not
destroyed during listen(2) as they are today.  So, move them out of the
sockbuf and into the owning socket.  For the sockbuf mutexes, keep a
pointer to the mutex in the sockbuf itself, for now.  This can be
removed by replacing SOCKBUF_LOCK() etc. with macros which operate on
the socket itself, as was done for the sockbuf I/O locks.

Reviewed by:    tuexen, gallatin
MFC after:      1 month

    [2 lines not shown]
DeltaFile
+15-4sys/sys/socketvar.h
+3-4sys/sys/sockbuf.h
+18-82 files

FreeBSD/src f94acf5sys/dev/cxgbe/tom t4_cpl_io.c, sys/dev/hyperv/hvsock hv_sock.c

socket: Rename sb(un)lock() and interlock with listen(2)

In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:    tuexen, gallatin
MFC after:      1 month

    [2 lines not shown]
DeltaFile
+45-10sys/kern/uipc_socket.c
+13-18sys/dev/hyperv/hvsock/hv_sock.c
+0-28sys/kern/uipc_sockbuf.c
+9-10sys/netinet/sctputil.c
+9-9sys/dev/cxgbe/tom/t4_cpl_io.c
+12-1sys/sys/socketvar.h
+88-764 files not shown
+103-9310 files