ip6: add SO_BINTIME support
This adds support for obtaining timestamps from IPv6 packets using the
SO_BINTIME socket option, bringing it in parity with IPv4 behavior.
Enable testing the SO_BINTIME option in the relevant (manual) regression
test.
PR: 289423
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D52504
(cherry picked from commit cd02a8a9f8be2085d5242606a79668dc3720e7b0)
tcp: save progress timeout cause in connection end status
TCP stats are currently incremented for the persist and progress
timeout conditions, but only the persist cause was saved in the
connection end info status, which in turn is logged in the
blackbox "connection end" event.
Reviewed by: tuexen
Sponsored by: Netflix, Inc.
(cherry picked from commit 1a61a673a3700c0ebdb0c5847b5923d0e3641f89)
tcp: improve SEG.ACK validation in SYN-RECEIVED
According to the fifth step in SEGMENT ARRIVES, send a RST segment in
response to an ACK segment which fails the SEG.ACK check, but leave
the endpoint state unchanged.
FreeBSD handles this correctly when entering the SYN-RECEIVED state via
the SYN-SENT state, but not in the SYN-cache code, which handles the
SYN-RECEIVED state via the LISTEN state.
This also fixes a panic reported by Alexander Leidinger.
Reviewed by: jtl, glebius
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52934
(cherry picked from commit 8af2f06a99b10c0d3ab9021949e750852662672a)
tcp: improve credential handling in syncache
When adding a syncache entry, take a reference count of the
credentials while the inp is still locked.
Thanks to markj@ for providing a hint regarding the root cause.
Reported by: David Marker
Reviewed by: glebius
Tested by: David Marker
Fixes: cbc9438f0505 ("tcp: improve ref count handling when processing SYN")
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D53380
(cherry picked from commit 44cb1e857f048d2326bdc1a032ccd2c04d2bcdc9)
tcp: cleanup of syncache_expand()
* Consistently free the string after unlocking the sch, if possible.
* Remove the failure handling in case of sc != NULL, since this is
not possible anymore.
* Remove the use of goto and instead return 0 in the three cases.
The only change in behavior is that in three out of the four cases,
where 0 is returned, *lsop is not set to NULL anymore. So the behavior
is now consistent and also documented in a comment. The current in
tree callers only look at *lsop, if and only if syncache_expand()
returns 1.
Reviewed by: Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52948
(cherry picked from commit aafdbf83b926519cb47de8f16a1a40c1ef3c84b5)
tcp: improve sending of SYN-cookies
Ensure that when the sysctl-variable net.inet.tcp.syncookies_only is
non zero, SYN-cookies are sent and no SYN-cache entry is added to the
SYN-cache. In particular, this behavior should not depend on the value
of the sysctl-variable net.inet.tcp.syncookies, which controls whether
SYN cookies are used in combination with the SYN-cache to deal with
bucket overflows.
Also ensure that tcps_sc_completed does not include TCP connections
established via a SYN-cookie.
While there, make V_tcp_syncookies and V_tcp_syncookiesonly bool
instead of int, since they are used as boolean variables.
Reviewed by: rscheff, cc, Peter Lei, Nick Banks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52225
(cherry picked from commit 7b57f2513361fb98fd5e2262f130989fe65946c6)
tcp: increase tcps_sc_recvcookie only in the syncache_expand()
The syncookie_expand() is called from syncookie_cmp() in INVARIANTS mode
to confirm that values calculated via syncookies mechanism match those
stored in the syncache entry. This creates a counting bug, that with
INVARIANTS every successful use of syncache also counts as use of a
syncookie.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D50897
(cherry picked from commit 3ed8d5645dd42a7c080ba800cf6d25cb7e147d7e)
tcp: refactor debug function syncookie_cmp()
- Don't bzero() the test structure. All fields checked are set by
syncache_expand().
- Don't allocate TCP address logging string if there is nothing to report.
- Mark hash bucket argument as pointer to const.
- Make it void.
Differential Revision: https://reviews.freebsd.org/D50896
(cherry picked from commit e9e6a025b4523c9aa2885e892495601964e03056)
tcp: rename syncookie_lookup() into syncookie_expand() and make it bool
This function always returns the same pointer it was passed. With new
name and return type the code is easier to understand. Mark the hash
bucket argument as pointer to const, since function doesn't modify it,
just uses value as integer. No functional changes.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D50895
(cherry picked from commit 6538742c1aaca3ce522ccea95007dfa9686c78dd)
tcp: micro-optimize SYN-cookie expansion
Only compute wscale when it is actually used. While there, change the
type of wscale to u_int as suggested by glebius.
No functional change intended.
Reviewed by: glebius, rscheff (older version)
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52296
(cherry picked from commit 341d1aabc13e47911d2eb38e857b90f7d356134e)
tcp: close two minor races with debug messages
The syncache entry is locked by the hash bucket lock. After running
SCH_UNLOCK(), we have no guarantee that the syncache entry still
exists.
Resolve the race by moving SCH_UNLOCK() after the log() call which
reads variables from the syncache entry.
Reviewed by: rrs, tuexen, Nick Banks
Sponsored by: Netflix
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D52868
(cherry picked from commit ad38f6a0b466bf05a0d40ce1daa8c7bce0936271)
tcp: improve segment validation in SYN-RECEIVED
The validation of SEG.SEQ (first step in SEGMENT ARRIVES of RFC 9293)
should be done before the validation of SEG.ACK (fifth step in
SEGMENT ARRIVES in RFC 9293).
Furthermore, when the SEG.SEQ validation fails, a challenge ACK
should be sent instead of sending a RST-segment and moving the
endpoint to CLOSED.
Reported by: Tilnel on freebsd-net
Reviewed by: Nick Banks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52849
(cherry picked from commit b7118461f9099876cb2c2923948f8fb647defd57)
tcp: cleanup syncache_expand()
Only validate SEG.SEQ and SEG.ACK when processing a real SYN-cache
entry. In the SYN-cookie case, these conditions are always true, since
the SYN-cache entry on the stack is constructed from the incoming
TCP segment.
While there, fix the logging messages.
Reviewed by: Nick Banks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52816
(cherry picked from commit 3034fa3d4321fdc487428c9050711de9ce234567)
tcp: keep SYN-cache entry when sending of challenge ACK fails
Don't drop a SYN-cache entry just because a challenge ACK couldn't
be sent. This might only be a temporary failure.
Reviewed by: Nick Banks, glebius, jtl
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52840
(cherry picked from commit 7841b44f8491d69c75207d0f3a1eb34501d99edd)
tcp: apply rate limits to challenge ACKs
When sending challenge ACKs from the SYN-cache, apply the same rate
limiting as in other states.
Reviewed by: cc, rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52754
(cherry picked from commit c2900b6e8255ba4f54dbd897cf42427db577ed3d)
tcp: improve comments in the syncache code
Add a comment explaining why syncache entries are dropped and fix a
typo in a comment.
Reviewed by: rrs, glebius
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D53564
(cherry picked from commit 17fb082104ee281365b72bd5135604cea5051df2)
tcp over udp: don't copy more bytes than avaiable
When copying the data in the first mbuf to get rid of the UDP
header, use the correct length. It was copying too much (8 bytes,
the length of the UDP header).
This only applies to handling TCP over UDP packets. The support for
TCP over UDP is disabled by default.
Reported by: jtl
Reviewed by: Peter Lei
Sponsored by: Netflix, Inc.
(cherry picked from commit bfda98a42027417b2fa74738c63327532013e93b)
tcp: refactor tcp_send_challenge_ack()
Refactor tcp_send_challenge_ack() such that the logic checking whether
a challenge ACK is sent or not is available in the separate function
tcp_challenge_ack_check(). This new function will also be used for
sending challenge ACKs in the SYN-cache code, which will be added in
upcoming commits.
No functional change intended.
Reviewed by: cc, Nick Banks, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52717
(cherry picked from commit db37256ce5437e6c667a537afff0fd9f59576514)
tcp: mitigate a side channel for detection of TCP connections
If a blind attacker wants to guess by sending ACK segments if there
exists a TCP connection , this might trigger a challenge ACK on an
existing TCP connection. To make this hit non-observable for the
attacker, also increment the global counter, which would have been
incremented if it would have been a non-hit.
This issue was reported as issue number 11 in Keyu Man et al.:
SCAD: Towards a Universal and Automated Network Side-Channel
Vulnerability Detection
Reviewed by: Nick Banks, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D51724
(cherry picked from commit f0f6e50388963cae44bb92bb69ed7a1135dd2eec)
tcp rack: fix sendmap app limited count tracking
rc_app_limited_cnt is an internal counter on the rack structure
that tracks the number of sendmap entries that have the
RACK_APP_LIMITED flag set. These entries gate goodput measurements.
The counter is reported in a number of blackbox logging events.
When a sendmap entry which has the RACK_APP_LIMITED flag set is
cloned, the counter was not being maintained properly.
While here, cleanup the counter check when a sendmap entry with
the flag set is freed which previously hid this issue.
Reviewed by: tuexen
Sponsored by: Netflix, Inc.
(cherry picked from commit e0838f8a2e61e73e37c1ae08eab9473daacaacb8)
tcp sack: improve computation of delivered_data
delivered_data is the number of bytes, which have newly been
delivered to the peer. This includes the number of bytes
cumulatively acknowledged and selectively acknowledged.
Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D51718
(cherry picked from commit 0950ab7d76951ea8e65f25a68511dc809339e802)
tcp: ensure SACK rxmit never ends up left of its hole
When a RTO happens during SACK loss recovery, snd_recover can possibly pulled left.
With Lost Retransmission Detection (LRD) this can lead to rxmit of a hole to end up
pointing to the left of the hole, which is unexpected and leads to complications.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D51725
(cherry picked from commit 65d4a83399843cb4c6dd44061599390843c162db)
tcp lro: use the flowid only when it has hash properties
When a packet is provided to LRO using tcp_lro_queue_mbuf(), a
sequence number is computed based on the m_pkthdr.flowid provided
by he driver. The implicit assumption is that the m_pkthdr.flowid
has hash properties.
The recent use of tcp_lro_queue_mbuf() in iflib exposed a bug in at
least one driver (igc) , which
* reports always that is uses M_HASHTYPE_OPAQUE.
* sets in some cases m_pkthdr.flowid not consistently for packets
belonging to the same TCP connection.
This results in severe performance problems for the base TCP stack,
since it handles the packets in the wrong sequence, although they were
received in the correct sequence.
To protect against such misbehaving drivers, just take the
m_pkthdr.flowid only into account, if it has hash properties.
The performance problems were observed by gallatin@ and analyzed
together with rrs@.
[6 lines not shown]