Commit 010ecd50 authored by Fabrice Bellet's avatar Fabrice Bellet Committed by Olivier Crête
Browse files

stun: update timer timeout and retransmissions

This patch updates the stun timing constants and provides the rationale
with the choice of these new values, in the context of the ice
connection check algorithm.

One important value during the discovery state is the combination of the
initial timeout and the number of retransmissions, because this state
may complete after the last stun discovery binding request has timed
out. With the combination of 500ms and 3 retransmissions, the discovery
state is bound to 2000ms to discover server reflexive and relay
candidates.

The retransmission delay doubles at each retransmission except for the
last one. Generally, this state will complete sooner, when all
discovery requests get a reply before the timeout.

Another mechanism is used during the connection check, where an stun
request is sent with an initial timeout defined by :

   RTO = MAX(500ms, Ta * (number of in-progress + waiting pairs))
   with Ta = 20ms

The initial timeout is bounded by a minimum value, 500ms, and scales
linearly depending of the number of pairs on the way to be emited. The
same number of retransmissions than in the discovery state in used
during the connection check. The total time to wait for a pair to fail
is then RTO + 2*RTO + RTO = 4*RTO with 3 retransmissions.

On a typical laptop setup, with a wired and a wifi interface with
IPv4/IPv6 dual stack, a link-local and a link-global IPv6 address, a
couple a virtual addresses, a server-reflexive address, a turn relay
one, we end up with a total of 90 local candidates for 2 streams and 2
components each.  The connection checks list includes up to 200 pairs
when tcp pairs are discarded, with :

  <33 in-progress and waiting pairs in 50% cases (RTO = 660ms),
  <55 in-progress and waiting pairs in 90% cases (RTO = 1100ms),
  and up to 86 in-progres and waiting pairs (RTO = 1720ms)

The number of retransmission of 3 seems to be quite robust to handle
sporadic packets loss, if we consider for example a typical packet loss
frequency of 1% of the overall packets transmitted.

And a relatevely large initial timeout is interesting because it reduces
the overall network overhead caused by the stun requests and replies,
mesured around 3KB/s during a connection check with 4 components.

Finally, the total time to wait until all retransmissions have completed
and have timed out (2000ms with an initial timeout of 500ms and 3
retransmissions) gives a bound to the worst network latency we can
accept, when no packet is lost on the wire.
parent e9cbb3da
Pipeline #143760 failed with stages
in 13 minutes and 52 seconds
......@@ -2762,13 +2762,10 @@ static unsigned int priv_compute_conncheck_timer (NiceAgent *agent, NiceStream *
rto = agent->timer_ta * waiting_and_in_progress;
/* RFC8445 indicates that the min rto value should be 500ms, but
* we prefer a lower value of 100ms, which should be overriden
* most of the time, when a significant number of pairs are handled.
*/
nice_debug ("Agent %p : timer set to %dms, "
"waiting+in_progress=%d", agent, MAX (rto, 100), waiting_and_in_progress);
return MAX (rto, 100);
"waiting+in_progress=%d", agent, MAX (rto, STUN_TIMER_DEFAULT_TIMEOUT),
waiting_and_in_progress);
return MAX (rto, STUN_TIMER_DEFAULT_TIMEOUT);
}
/*
......
......@@ -130,29 +130,39 @@ struct stun_timer_s {
* STUN_TIMER_DEFAULT_TIMEOUT:
*
* The default intial timeout to use for the timer
* RFC recommendds 500, but it's ridiculous, 50ms is known to work in most
* cases as it is also what is used by SIP style VoIP when sending A-Law and
* mu-Law audio, so 200ms should be hyper safe. With an initial timeout
* of 200ms, a default of 7 transmissions, the last timeout will be
* 16 * 200ms, and we expect to receive a response from the stun server
* before (1 + 2 + 4 + 8 + 16 + 32 + 16) * 200ms = 15200 ms after the initial
* stun request has been sent.
* This timeout is used for discovering server reflexive and relay
* candidates, and also for keepalives, and turn refreshes.
*
* This value is important because it defines how much time will be
* required to discover our local candidates, and this is an
* uncompressible delay before the agent signals that candidates
* gathering is done.
*
* The overall delay required for the discovery stun requests is
* computed as follow, with 3 retransmissions and an initial delay
* of 500ms : 500 * ( 1 + 2 + 1 ) = 2000 ms
* The timeout doubles at each retransmission, except for the last one.
*/
#define STUN_TIMER_DEFAULT_TIMEOUT 200
#define STUN_TIMER_DEFAULT_TIMEOUT 500
/**
* STUN_TIMER_DEFAULT_MAX_RETRANSMISSIONS:
*
* The default maximum retransmissions allowed before a timer decides to timeout
* The default maximum retransmissions before declaring that the
* transaction timed out.
*/
#define STUN_TIMER_DEFAULT_MAX_RETRANSMISSIONS 7
#define STUN_TIMER_DEFAULT_MAX_RETRANSMISSIONS 3
/**
* STUN_TIMER_DEFAULT_RELIABLE_TIMEOUT:
*
* The default intial timeout to use for a reliable timer
*
* The idea with this value is that stun request sent over udp or tcp
* should fail at the same time, with an initial default timeout set
* to 500ms.
*/
#define STUN_TIMER_DEFAULT_RELIABLE_TIMEOUT 7900
#define STUN_TIMER_DEFAULT_RELIABLE_TIMEOUT 2000
/**
* StunUsageTimerReturn:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment