Skip to content

Commit 843c2fd

Browse files
Florian Westphaldavem330
authored andcommitted
net: dctcp: loosen requirement to assert ECT(0) during 3WHS
One deployment requirement of DCTCP is to be able to run in a DC setting along with TCP traffic. As Glenn Judd's NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter" [1] (tba) explains, one way to solve this on switch side is to split DCTCP and TCP traffic in two queues per switch port based on the DSCP: one queue soley intended for DCTCP traffic and one for non-DCTCP traffic. For the DCTCP queue, there's the marking threshold K as explained in commit e3118e8 ("net: tcp: add DCTCP congestion control algorithm") for RED marking ECT(0) packets with CE. For the non-DCTCP queue, there's f.e. a classic tail drop queue. As already explained in e3118e8, running DCTCP at scale when not marking SYN/SYN-ACK packets with ECT(0) has severe consequences as for non-ECT(0) packets, traversing the RED marking DCTCP queue will result in a severe reduction of connection probability. This is due to the DCTCP queue being dominated by ECT(0) traffic and switches handle non-ECT traffic in the RED marking queue after passing K as drops, where K is usually a low watermark in order to leave enough tailroom for bursts. Splitting DCTCP traffic among several queues (ECN and non-ECN queue) is being considered a terrible idea in the network community as it splits single flows across multiple network paths. Therefore, commit e3118e8 implements this on Linux as ECT(0) marked traffic, as we argue that marking all packets of a DCTCP flow is the only viable solution and also doesn't speak against the draft. However, recently, a DCTCP implementation for FreeBSD hit also their mainline kernel [2]. In order to let them play well together with Linux' DCTCP, we would need to loosen the requirement that ECT(0) has to be asserted during the 3WHS as not implemented in FreeBSD. This simplifies the ECN test and lets DCTCP work together with FreeBSD. Joint work with Daniel Borkmann. [1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd [2] freebsd/freebsd-src@8ad8794 Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Glenn Judd <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 6942241 commit 843c2fd

File tree

1 file changed

+5
-9
lines changed

1 file changed

+5
-9
lines changed

net/ipv4/tcp_input.c

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5872,10 +5872,9 @@ static inline void pr_drop_req(struct request_sock *req, __u16 port, int family)
58725872
* TCP ECN negotiation.
58735873
*
58745874
* Exception: tcp_ca wants ECN. This is required for DCTCP
5875-
* congestion control; it requires setting ECT on all packets,
5876-
* including SYN. We inverse the test in this case: If our
5877-
* local socket wants ECN, but peer only set ece/cwr (but not
5878-
* ECT in IP header) its probably a non-DCTCP aware sender.
5875+
* congestion control: Linux DCTCP asserts ECT on all packets,
5876+
* including SYN, which is most optimal solution; however,
5877+
* others, such as FreeBSD do not.
58795878
*/
58805879
static void tcp_ecn_create_request(struct request_sock *req,
58815880
const struct sk_buff *skb,
@@ -5885,18 +5884,15 @@ static void tcp_ecn_create_request(struct request_sock *req,
58855884
const struct tcphdr *th = tcp_hdr(skb);
58865885
const struct net *net = sock_net(listen_sk);
58875886
bool th_ecn = th->ece && th->cwr;
5888-
bool ect, need_ecn, ecn_ok;
5887+
bool ect, ecn_ok;
58895888

58905889
if (!th_ecn)
58915890
return;
58925891

58935892
ect = !INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield);
5894-
need_ecn = tcp_ca_needs_ecn(listen_sk);
58955893
ecn_ok = net->ipv4.sysctl_tcp_ecn || dst_feature(dst, RTAX_FEATURE_ECN);
58965894

5897-
if (!ect && !need_ecn && ecn_ok)
5898-
inet_rsk(req)->ecn_ok = 1;
5899-
else if (ect && need_ecn)
5895+
if ((!ect && ecn_ok) || tcp_ca_needs_ecn(listen_sk))
59005896
inet_rsk(req)->ecn_ok = 1;
59015897
}
59025898

0 commit comments

Comments
 (0)