Skip to content
  • Fabrice Bellet's avatar
    stun: update timer timeout and retransmissions · 010ecd50
    Fabrice Bellet authored and Olivier Crête's avatar Olivier Crête committed
    This patch updates the stun timing constants and provides the rationale
    with the choice of these new values, in the context of the ice
    connection check algorithm.
    
    One important value during the discovery state is the combination of the
    initial timeout and the number of retransmissions, because this state
    may complete after the last stun discovery binding request has timed
    out. With the combination of 500ms and 3 retransmissions, the discovery
    state is bound to 2000ms to discover server reflexive and relay
    candidates.
    
    The retransmission delay doubles at each retransmission except for the
    last one. Generally, this state will complete sooner, when all
    discovery requests get a reply before the timeout.
    
    Another mechanism is used during the connection check, where an stun
    request is sent with an initial timeout defined by :
    
       RTO = MAX(500ms, Ta * (number of in-progress + waiting pairs))
       with Ta = 20ms
    
    The initial timeout is bounded by a minimum value, 500ms, and scales
    linearly depending of the number of pairs on the way to be emited. The
    same number of retransmissions than in the discovery state in used
    during the connection check. The total time to wait for a pair to fail
    is then RTO + 2*RTO + RTO = 4*RTO with 3 retransmissions.
    
    On a typical laptop setup, with a wired and a wifi interface with
    IPv4/IPv6 dual stack, a link-local and a link-global IPv6 address, a
    couple a virtual addresses, a server-reflexive address, a turn relay
    one, we end up with a total of 90 local candidates for 2 streams and 2
    components each.  The connection checks list includes up to 200 pairs
    when tcp pairs are discarded, with :
    
      <33 in-progress and waiting pairs in 50% cases (RTO = 660ms),
      <55 in-progress and waiting pairs in 90% cases (RTO = 1100ms),
      and up to 86 in-progres and waiting pairs (RTO = 1720ms)
    
    The number of retransmission of 3 seems to be quite robust to handle
    sporadic packets loss, if we consider for example a typical packet loss
    frequency of 1% of the overall packets transmitted.
    
    And a relatevely large initial timeout is interesting because it reduces
    the overall network overhead caused by the stun requests and replies,
    mesured around 3KB/s during a connection check with 4 components.
    
    Finally, the total time to wait until all retransmissions have completed
    and have timed out (2000ms with an initial timeout of 500ms and 3
    retransmissions) gives a bound to the worst network latency we can
    accept, when no packet is lost on the wire.
    010ecd50