1. 08 May, 2020 3 commits
  2. 07 May, 2020 9 commits
    • Fabrice Bellet's avatar
      stun: update timer timeout and retransmissions · 010ecd50
      Fabrice Bellet authored
      This patch updates the stun timing constants and provides the rationale
      with the choice of these new values, in the context of the ice
      connection check algorithm.
      
      One important value during the discovery state is the combination of the
      initial timeout and the number of retransmissions, because this state
      may complete after the last stun discovery binding request has timed
      out. With the combination of 500ms and 3 retransmissions, the discovery
      state is bound to 2000ms to discover server reflexive and relay
      candidates.
      
      The retransmission delay doubles at each retransmission except for the
      last one. Generally, this state will complete sooner, when all
      discovery requests get a reply before the timeout.
      
      Another mechanism is used during the connection check, where an stun
      request is sent with an initial timeout defined by :
      
         RTO = MAX(500ms, Ta * (number of in-progress + waiting pairs))
         with Ta = 20ms
      
      The initial timeout is bounded by a minimum value, 500ms, and scales
      linearly depending of the number of pairs on the way to be emited. The
      same number of retransmissions than in the discovery state in used
      during the connection check. The total time to wait for a pair to fail
      is then RTO + 2*RTO + RTO = 4*RTO with 3 retransmissions.
      
      On a typical laptop setup, with a wired and a wifi interface with
      IPv4/IPv6 dual stack, a link-local and a link-global IPv6 address, a
      couple a virtual addresses, a server-reflexive address, a turn relay
      one, we end up with a total of 90 local candidates for 2 streams and 2
      components each.  The connection checks list includes up to 200 pairs
      when tcp pairs are discarded, with :
      
        <33 in-progress and waiting pairs in 50% cases (RTO = 660ms),
        <55 in-progress and waiting pairs in 90% cases (RTO = 1100ms),
        and up to 86 in-progres and waiting pairs (RTO = 1720ms)
      
      The number of retransmission of 3 seems to be quite robust to handle
      sporadic packets loss, if we consider for example a typical packet loss
      frequency of 1% of the overall packets transmitted.
      
      And a relatevely large initial timeout is interesting because it reduces
      the overall network overhead caused by the stun requests and replies,
      mesured around 3KB/s during a connection check with 4 components.
      
      Finally, the total time to wait until all retransmissions have completed
      and have timed out (2000ms with an initial timeout of 500ms and 3
      retransmissions) gives a bound to the worst network latency we can
      accept, when no packet is lost on the wire.
      010ecd50
    • Fabrice Bellet's avatar
      conncheck: update the unfreeze method for RFC8445 · e9cbb3da
      Fabrice Bellet authored
      The way pairs are unfrozen between RFC5245 and RFC8445 changed a bit,
      and made the code much more simple. Previously pairs were unfrozen "per
      stream", not they are unfrozen "per foundation". The principle of the
      priv_conn_check_unfreeze_next function is now to unfreeze one and only
      one frozen pair per foundation, all components and streams included.
      The function is now idemporent: calling it when the connchecks still
      contains waiting pairs does nothing.
      e9cbb3da
    • Fabrice Bellet's avatar
      conncheck: update stun timer timeout for RFC8445 · 0f6eadc9
      Fabrice Bellet authored
      The new version of the RFC suppressed the difference between reliable
      and not reliable maximum value for RTO. We choose to keep the value of
      100ms that we used previously, which is lower that the recommended
      value, but could be overriden most of the time, when a significant
      number of pairs are handled.
      
      We also compute exactly the number of in-progress and waiting
      pairs for all streams of the agent, without relying on the value
      per-stream, multiplied by the number of active streams.
      0f6eadc9
    • Fabrice Bellet's avatar
      conncheck: another rare case of local tcp active candidate matching · 8efaf78c
      Fabrice Bellet authored
      An inbound stun request may come on a tcp pair, whose tcp-active socket
      has just been created and connected (the local candidate port is zero),
      but has not caused the creation of a discovered peer-reflexive local
      candidate (with a non-zero port). This inbound request is stored in an
      early icheck structure to be replayed later. When being processed after
      remote creds have been received, we have to find which local candidate
      it belongs to, by matching with the address only, without the port.
      8efaf78c
    • Fabrice Bellet's avatar
      conncheck: socket reliability should not change the conncheck behaviour · 6f0710c4
      Fabrice Bellet authored
      An inbound STUN request on a pair having another STUN request already
      inflight already should generate to new triggered check, no matter the
      type of the underlying socket.
      6f0710c4
    • Fabrice Bellet's avatar
      conncheck: inbound stun on tcp passive pairs should trigger a check · c0bd3209
      Fabrice Bellet authored
      An inbound stun request on a newly discovered pair should trigger a
      conncheck in the reverse direction, and not promote the pair directly in
      state succeeded. This is particulary required if the agent is in
      aggressive controlling mode.
      c0bd3209
    • Fabrice Bellet's avatar
      conncheck: simplify the test to find a matching local candidate · a2fb11fc
      Fabrice Bellet authored
      Since we keep a relation between a succeeded and its discovered pair, we
      can just test for the socket associated to a given pair, and eventually
      follow the link to the parent succeeded pair.
      a2fb11fc
    • Fabrice Bellet's avatar
      conncheck: properly select tcp-active discovered candidate · e7b04942
      Fabrice Bellet authored
      Some tcp-active discovered peer-reflexive local candidates may only be
      recognised by their local socket, if they have the same address and same
      port. It may happen when a nat generates an identical mapping from two
      different base local candidates.
      e7b04942
    • Fabrice Bellet's avatar
      stun: set delay in retransmission instead of adding it · ade16ee9
      Fabrice Bellet authored
      We may have situation when stun_timer_refresh is called with a
      significant delay after the current deadline. In the actual situation,
      this delay is just included to the computation of the new deadline of the
      next stun retransmission. We think this may lead to unfair situations,
      where the next deadline may be too short, just to compensate the first
      deadline that was too long.
      
      For example, if a stun request is scheduled with a delay of
      200ms for the 2nd transmission, and 400ms for the 3rd transmission,
      if stun_timer_remainder() is called 300ms after the start of the
      timer, the second delay will last only 300ms, instead of 400ms.
      ade16ee9
  3. 06 May, 2020 15 commits
    • Fabrice Bellet's avatar
      discovery: ensure port number uniqueness agent-wide · c4993988
      Fabrice Bellet authored
      The port number must be different for all local host candidates, not
      just in the same component, but across all components and all streams.
      A candidate ambiguity between a host local host and an identical server
      reflexive candidate have more unwanted consequences when it concerns two
      different components, because an inbound stun request may be associated
      to a wrong component.
      c4993988
    • Olivier Crête's avatar
    • Olivier Crête's avatar
      address: Make the private IP address detector more complete · 04ae880c
      Olivier Crête authored
      Also adds a unit test
      
      Fixes #67
      04ae880c
    • Fabrice Bellet's avatar
      conncheck: fix some missing loop exit cases · f6b4a73b
      Fabrice Bellet authored
      The refresh list may be modified while being iterated
      f6b4a73b
    • Fabrice Bellet's avatar
      3220abce
    • Fabrice Bellet's avatar
      conncheck: merge two cascaded conditions · 692b1055
      Fabrice Bellet authored
      692b1055
    • Olivier Crête's avatar
      interfaces: Use union for sockaddr/sockaddr_in · db9c2ca6
      Olivier Crête authored
      This makes clang happy
      Fixes #100
      db9c2ca6
    • Fabrice Bellet's avatar
      conncheck: honor the retransmit flag in case of role conflict · ac3f7b5e
      Fabrice Bellet authored
      This other rare situation happens when a role conflict is detected by an
      stun reply message, on a component that already has a nominated pair
      with a higher priority. In that case, the retransmit flag should be
      honored, and the pair with "role conflict" should not be retransmitted.
      ac3f7b5e
    • Fabrice Bellet's avatar
      conncheck: add missing cases when pruning pending checks · 6dd39694
      Fabrice Bellet authored
      When pruning pending checks (after at least one nominated pair has been
      obtained), some supplementary cases need to be handled, to ensure that
      the property "all pairs and only the pairs having a higher priority than
      the nominated pair should have the stun retransmit flag set" remains
      true during the whole conncheck:
      
      - a pair "not to be retransmitted" must be removed from the triggered check
        list (because a triggered check would create a new stun request, that
        would defacto ignore the retransmit flag)
      
      - an in-progress pair "not to be retransmitted", for which no stun
        request has been sent (p->stun_transactions == NULL, a transient
        state) must be removed from the conncheck list, just like a waiting
        pair.
      
      - a failed pair must have its flag "retransmit" updated too, just like
        another pair, since a failed pair could match an inbound check, and
        generate a triggered check, based on retransmit flag value : ie only
        if this pair has a chance to become a better nominated pair. See
        NICE_CHECK_FAILED case in priv_schedule_triggered_check().
      6dd39694
    • Fabrice Bellet's avatar
      conncheck: toggle the retransmit flag when pruning pending checks · 1e9edb69
      Fabrice Bellet authored
      The function conn_check_update_retransmit_flag() that was introduced to
      reenable the retransmit flag on pairs with higher priority than the
      nominated one can be merged in priv_prune_pending_checks(), and its
      invocation replaced by conn_check_update_check_list_state_for_ready().
      
      The function priv_prune_pending_checks() can also be tweaked to use
      the component selected pair priority, instead of getting it from
      the checklist. This function is called when at least one nominated pair
      exists, so selected_pair is this nominated pair.
      1e9edb69
    • Fabrice Bellet's avatar
      conncheck: discard new pair creation when priority is too low · fcd6bc86
      Fabrice Bellet authored
      It is possible to ignore the creation of a new pair whose priority is
      lower than the priority of the selected pair, ie the nominated pair with
      the highest priority. Such pair would be discarded by a call to
      prune_pending_checks(), and if checked, there state would break the
      assumption that all pairs with lower priority than the nominated pair
      are not retransmitted.
      fcd6bc86
    • Fabrice Bellet's avatar
      conncheck: optimize pending checks pruning · 93b5d3cb
      Fabrice Bellet authored
      We use the property that the conncheck list is ordered by
      pairs priorities, so we don't have to iterate twice.
      93b5d3cb
    • Fabrice Bellet's avatar
      conncheck: enable retransmit flag after nominated pair reordering · 0c142b36
      Fabrice Bellet authored
      When an existing peer-reflexive remote candidate is updated to a server
      reflexive one, due to the late reception of remove candidates, this
      update has several consequences on the conncheck list:
      
       - pair foundations and priorities must be recomputed
       - the highest nominated pair may have changed too
       - this is not strictly required, but some pairs that had *a lower*
         priority than the previously peer-reflexive nominated pair, had
         their retransmit flag set to false, for this reason. These pairs may
         now have *a higher* priority than the newly promoted nominated pair,
         and it is fair in that case to turn their retransmit flag back to
         true.
      0c142b36
    • Olivier Crête's avatar
    • Olivier Crête's avatar
      tests: Replace g_assert (memcmp) with g_assert_cmpmem() · d6317167
      Olivier Crête authored
      This makes for clearer reports in the CI
      d6317167
  4. 05 May, 2020 5 commits
  5. 04 May, 2020 8 commits
    • Fabrice Bellet's avatar
      conncheck: more debug · 95de1465
      Fabrice Bellet authored
      95de1465
    • Fabrice Bellet's avatar
      conncheck: add debug about agent mode · e1fb2f0f
      Fabrice Bellet authored
      e1fb2f0f
    • Fabrice Bellet's avatar
      f4bdeb5d
    • Fabrice Bellet's avatar
      66229223
    • Fabrice Bellet's avatar
      conncheck: wait for a pair until all stun requests are completed · 19c599a0
      Fabrice Bellet authored
      Only the newest stun request may need to be retransmitted, according to
      the pair retransmit flag. This is the first element of the
      stun_transactions list. Older stun requests are just kept around until
      their timeout expires, without retransmission.
      
      The newest stun request is usually the last one that will timeout.
      Current code was based on that assumption, causing the pair to fail when
      the newest stun request timeout expires. This is not always true, and some
      older stun requests may have a greater timeout delay.
      
      So, we should wait until *all* stun requests of a given pair have
      reached their timeout.
      
      We also refactor this part of the code, to handle the first stun and the
      other stun requests in the same loop.
      19c599a0
    • Fabrice Bellet's avatar
      discovery: use different port numbers for every local host candidates · a04fa4d4
      Fabrice Bellet authored
      This constraint is added to handle the situation where the agent runs on
      a box doing SNAT on one of its outgoing network interface. The NAT does
      usually its best to ensure that source port number is preserved on the
      external NAT address and port. This is called "port preservation" in RFC
      4787.
      
      When two local host candidates are allowed to have the same source port
      number, we increase the risk that a first local host candidate *is* the
      NAT mapping address and port of a second local host candidate, because
      of the "port preservation" effect. When it happens, a server reflexive
      candidate and a host candidate will have the same address and port.
      
      For that situation to happen, a stun request must be emitted from the
      internal address first, the NAT mapping doing the port preservation will
      be created for the internal address, and when a stun request is sent
      from the external address thereafter, a new NAT mapping will be created,
      but without port preservation, because the previous mapping already took
      that reservation.
      
      The problem will occur on the remote agent, when receiving a stun request
      from this address and port, that has no way to know wheather it comes from
      the host or the server reflexive candidate, if both have been advertised
      remotely, resulting in pair type mislabelling.
      
      This case may happen more easily when a source port range is reduced.
      a04fa4d4
    • Fabrice Bellet's avatar
      agent: stay in aggressive mode after conncheck has started · 0b80cbba
      Fabrice Bellet authored
      When remote tcp candidates are received late after the conncheck has
      started, RFC 6554 suggests that we switch the nomination mode back from
      aggressive to regular. The problem is that some stun requests may
      already be inflight with the use-candidate stun flag set, and reverting
      to regular mode in that case is too late, because these inflight
      requests may nominate a pair on the remote agent, and not on the local
      agent. We prefer to just ignore remote tcp candidates that are received
      after the component state has reached state CONNECTING.
      0b80cbba
    • Fabrice Bellet's avatar