• Daniel Borkmann's avatar
    tcp: add rfc3168, section fallback · 49213555
    Daniel Borkmann authored
    This work as a follow-up of commit f7b3bec6 ("net: allow setting ecn
    via routing table") and adds RFC3168 section fallback for outgoing
    ECN connections. In other words, this work adds a retry with a non-ECN
    setup SYN packet, as suggested from the RFC on the first timeout:
      [...] A host that receives no reply to an ECN-setup SYN within the
      normal SYN retransmission timeout interval MAY resend the SYN and
      any subsequent SYN retransmissions with CWR and ECE cleared. [...]
    Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
    that is, Linux default since 2009 via commit 255cac91 ("tcp: extend
    ECN sysctl to allow server-side only ECN"):
     1) Normal ECN-capable path:
        SYN ECE CWR ----->
                    <----- SYN ACK ECE
                ACK ----->
     2) Path with broken middlebox, when client has fallback:
        SYN ECE CWR ----X crappy middlebox drops packet
                          (timeout, rtx)
                SYN ----->
                    <----- SYN ACK
                ACK ----->
    In case we would not have the fallback implemented, the middlebox drop
    point would basically end up as:
        SYN ECE CWR ----X crappy middlebox drops packet
                          (timeout, rtx)
        SYN ECE CWR ----X crappy middlebox drops packet
                          (timeout, rtx)
        SYN ECE CWR ----X crappy middlebox drops packet
                          (timeout, rtx)
    In any case, it's rather a smaller percentage of sites where there would
    occur such additional setup latency: it was found in end of 2014 that ~56%
    of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
    ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
    when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
    fallback would mitigate with a slight latency trade-off. Recent related
    paper on this topic:
      Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
      Gorry Fairhurst, and Richard Scheffenegger:
        "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
        Proc. PAM 2015, New York.
    Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
    section fallback on timeout. For users explicitly not wanting this
    which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
    allows for disabling the fallback.
    tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
    rather we let tcp_ecn_rcv_synack() take that over on input path in case a
    SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
    ECN being negotiated eventually in that case.
    Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
    Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdfSigned-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
    Signed-off-by: default avatarMirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
    Signed-off-by: default avatarBrian Trammell <trammell@tik.ee.ethz.ch>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Dave That <dave.taht@gmail.com>
    Acked-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
dctcp.txt 1.63 KB