1. 22 May, 2020 1 commit
  2. 19 May, 2020 1 commit
    • Fabrice Bellet's avatar
      conncheck: do not always remove pair in triggered check list · 4c53dc2f
      Fabrice Bellet authored
      This patch reenables an interesting side effect that existed before
      commit 263c0903, when the state of a pair state in the triggered check
      list was changed to in-progress. Such "triggered" pairs with this state
      were selectively pruned from the conncheck list according to their
      priority in priv_prune_pending_checks(), meaning that pairs with a high
      priority were conserved, and quickly rechecked.
      
      Retrospectively, I suspect that this side effect was the initial
      motivation for changing the state of a "triggered" pair.
      
      Commit 263c0903 disabled that behaviour, for the sake of clarity, but it
      seems important to restore it, because these "triggered" pairs are often
      retriggered for a good reason, and frequently lead to a nominated pair.
      And loosing the opportunity to nominate a pair may be critical in
      controlled role when the peer agent is in aggressive nomination mode.
      4c53dc2f
  3. 18 May, 2020 7 commits
  4. 14 May, 2020 3 commits
    • Fabrice Bellet's avatar
      conncheck: rework the stun requests ordering per timer tick · 72ccb1a2
      Fabrice Bellet authored
      With this patch, we merge the two variables stun_sent and
      keep_timer_going. The three functions that are a possible source of a
      new stun request returns a boolean value stating if a request has been
      sent.  The semantic of keep_timer_going can now be deduced from
      stun_sent and from the result of priv_conn_check_stream_nominate().
      
      The trick that makes this merge possible is to repurpose the return
      value of priv_conn_check_tick_stream(), because keep_timer_going set
      when the conncheck list contains in-progress pairs in this function is
      redundant with the same check later in function
      priv_conn_check_tick_stream_nominate().
      72ccb1a2
    • Fabrice Bellet's avatar
      conncheck: explicitely order the type of stun requests per timer tick · 90c21bf9
      Fabrice Bellet authored
      With this patch, we try to make more explicit the process order between
      the different types of stun requets, according that only one request is
      sent per callback timer tick, ie every 20ms, to respect the stun pacing
      of the spec. We implement the follow priority:
      
       * triggered checks
       * stun retransmissions
       * ordinary checks
      
      In a concrete case, while a stream has stun requests related to
      triggered checks to be sent, all other stun transactions are delayed to
      the next timer ticks.
      
      The goal of this patch is to make this priority explicit, and more
      easily swappable if needed. Triggered checks have more probability to
      succeed than stun retransmissions, this is the reason why they are
      handled before. Ordinary checks on the contrary can be performed on a
      lower priority basis, after all other stun requests.
      
      The problem that can be sometime observed with a large number of stun
      transactions is that stun retransmissions may suffer from a delay after
      they have reached their deadline. This delay should remain small thanks
      to the design of the initial retransmission timer (RTO), that takes into
      account the overall number of scheduled stun requests. It allows all
      stun requests to be sent and resent at a predefined "pacing" frequency
      without much extra delay.
      
      This ordering not perfect, because stun requests of a given type are
      examinated per-stream, by looking at the first stream before the others,
      so it introduces a natural priority for the first stream.
      90c21bf9
    • Olivier Crête's avatar
      gitlab-ci: Actually gnore mingw failure · 3eadfe34
      Olivier Crête authored
      3eadfe34
  5. 13 May, 2020 1 commit
  6. 12 May, 2020 1 commit
  7. 08 May, 2020 9 commits
    • Olivier Crête's avatar
      README: Update for meson · f2fe645a
      Olivier Crête authored
      Also, update the RFC numbers that are implemented.
      f2fe645a
    • Fabrice Bellet's avatar
      conncheck: fix try-alternate on tcp turn sockets · 1bfac496
      Fabrice Bellet authored
      In OC2007R2 compatibility mode, we observed the behaviour of a skype
      turn server, when returning code 300 (try-alternate) stun error on its
      tls connections. This value is returned apparently when the turn server
      is overloaded already.
      
      We noticed that the actual code in priv_handle_turn_alternate_server()
      cannot handle a non-udp turn server, because a tcp one would require
      to create a new socket.
      
      But, even when creating such a new socket stack (tcp-bsd socket +
      pseudossl socket), libnice still fails to establish a new connection to
      the alternate server on port 443, in a very systematic way. I'm not sure
      whether this problem is specific to this skype server infrastructure
      (the skype client fails in a similar way). Anyway, this code path works
      as expected with a non-microsoft turn server (tested with coturn).
      1bfac496
    • Fabrice Bellet's avatar
      agent: fix tcp connection to turn servers · f974224e
      Fabrice Bellet authored
      A previous commit broke the logic used to start a discovery request for
      tcp turn servers. The ambiguity came from the distinction between the
      type of the turn server (turn->type), the compatibility of the
      transport of the local base candidate (turn_tcp), and the reliability
      of the underlying tcp socket (reliable_tcp).
      
      reliable_tcp indicates whether the turn allocate request should be
      "framed" in a tcp packet, according to RFC 4571. This is required in
      OC2007R2 only.
      
      This commit also puts the setup of the tcp turn socket in a separate
      function, because such setup is also required when handling
      try-alternate (code 300) stun errors on these tcp sockets, where we have
      to setup a new connection to another tcp turn server.
      f974224e
    • Fabrice Bellet's avatar
      conncheck: ignore refreshes for tls turn server for OC2007R2 · daa6c296
      Fabrice Bellet authored
      Relay candidates obtained from TLS turn server don't have to be
      refreshed in OC2007R2 compatibility mode.
      daa6c296
    • Fabrice Bellet's avatar
    • Olivier Crête's avatar
      meson: Move glib wrap to download · 74f28bda
      Olivier Crête authored
      74f28bda
    • Fabrice Bellet's avatar
      conncheck: make a single ordinary check per callback · 6c2dc976
      Fabrice Bellet authored
      This is more friendly with stun pacing.
      6c2dc976
    • Fabrice Bellet's avatar
    • Fabrice Bellet's avatar
      agent: stay in aggressive mode after stun requests have been sent · cbf0162d
      Fabrice Bellet authored
      This patch updates the previous commit "agent: stay in aggressive mode
      after conncheck has started", by accepting to switch from aggressive to
      regular mode, while no stun request has been sent. It gives the agent
      some extra delay to still accept remote tcp candidates, after its state
      already changed from gathering to connecting.
      cbf0162d
  8. 07 May, 2020 9 commits
    • Fabrice Bellet's avatar
      stun: update timer timeout and retransmissions · 010ecd50
      Fabrice Bellet authored
      This patch updates the stun timing constants and provides the rationale
      with the choice of these new values, in the context of the ice
      connection check algorithm.
      
      One important value during the discovery state is the combination of the
      initial timeout and the number of retransmissions, because this state
      may complete after the last stun discovery binding request has timed
      out. With the combination of 500ms and 3 retransmissions, the discovery
      state is bound to 2000ms to discover server reflexive and relay
      candidates.
      
      The retransmission delay doubles at each retransmission except for the
      last one. Generally, this state will complete sooner, when all
      discovery requests get a reply before the timeout.
      
      Another mechanism is used during the connection check, where an stun
      request is sent with an initial timeout defined by :
      
         RTO = MAX(500ms, Ta * (number of in-progress + waiting pairs))
         with Ta = 20ms
      
      The initial timeout is bounded by a minimum value, 500ms, and scales
      linearly depending of the number of pairs on the way to be emited. The
      same number of retransmissions than in the discovery state in used
      during the connection check. The total time to wait for a pair to fail
      is then RTO + 2*RTO + RTO = 4*RTO with 3 retransmissions.
      
      On a typical laptop setup, with a wired and a wifi interface with
      IPv4/IPv6 dual stack, a link-local and a link-global IPv6 address, a
      couple a virtual addresses, a server-reflexive address, a turn relay
      one, we end up with a total of 90 local candidates for 2 streams and 2
      components each.  The connection checks list includes up to 200 pairs
      when tcp pairs are discarded, with :
      
        <33 in-progress and waiting pairs in 50% cases (RTO = 660ms),
        <55 in-progress and waiting pairs in 90% cases (RTO = 1100ms),
        and up to 86 in-progres and waiting pairs (RTO = 1720ms)
      
      The number of retransmission of 3 seems to be quite robust to handle
      sporadic packets loss, if we consider for example a typical packet loss
      frequency of 1% of the overall packets transmitted.
      
      And a relatevely large initial timeout is interesting because it reduces
      the overall network overhead caused by the stun requests and replies,
      mesured around 3KB/s during a connection check with 4 components.
      
      Finally, the total time to wait until all retransmissions have completed
      and have timed out (2000ms with an initial timeout of 500ms and 3
      retransmissions) gives a bound to the worst network latency we can
      accept, when no packet is lost on the wire.
      010ecd50
    • Fabrice Bellet's avatar
      conncheck: update the unfreeze method for RFC8445 · e9cbb3da
      Fabrice Bellet authored
      The way pairs are unfrozen between RFC5245 and RFC8445 changed a bit,
      and made the code much more simple. Previously pairs were unfrozen "per
      stream", not they are unfrozen "per foundation". The principle of the
      priv_conn_check_unfreeze_next function is now to unfreeze one and only
      one frozen pair per foundation, all components and streams included.
      The function is now idemporent: calling it when the connchecks still
      contains waiting pairs does nothing.
      e9cbb3da
    • Fabrice Bellet's avatar
      conncheck: update stun timer timeout for RFC8445 · 0f6eadc9
      Fabrice Bellet authored
      The new version of the RFC suppressed the difference between reliable
      and not reliable maximum value for RTO. We choose to keep the value of
      100ms that we used previously, which is lower that the recommended
      value, but could be overriden most of the time, when a significant
      number of pairs are handled.
      
      We also compute exactly the number of in-progress and waiting
      pairs for all streams of the agent, without relying on the value
      per-stream, multiplied by the number of active streams.
      0f6eadc9
    • Fabrice Bellet's avatar
      conncheck: another rare case of local tcp active candidate matching · 8efaf78c
      Fabrice Bellet authored
      An inbound stun request may come on a tcp pair, whose tcp-active socket
      has just been created and connected (the local candidate port is zero),
      but has not caused the creation of a discovered peer-reflexive local
      candidate (with a non-zero port). This inbound request is stored in an
      early icheck structure to be replayed later. When being processed after
      remote creds have been received, we have to find which local candidate
      it belongs to, by matching with the address only, without the port.
      8efaf78c
    • Fabrice Bellet's avatar
      conncheck: socket reliability should not change the conncheck behaviour · 6f0710c4
      Fabrice Bellet authored
      An inbound STUN request on a pair having another STUN request already
      inflight already should generate to new triggered check, no matter the
      type of the underlying socket.
      6f0710c4
    • Fabrice Bellet's avatar
      conncheck: inbound stun on tcp passive pairs should trigger a check · c0bd3209
      Fabrice Bellet authored
      An inbound stun request on a newly discovered pair should trigger a
      conncheck in the reverse direction, and not promote the pair directly in
      state succeeded. This is particulary required if the agent is in
      aggressive controlling mode.
      c0bd3209
    • Fabrice Bellet's avatar
      conncheck: simplify the test to find a matching local candidate · a2fb11fc
      Fabrice Bellet authored
      Since we keep a relation between a succeeded and its discovered pair, we
      can just test for the socket associated to a given pair, and eventually
      follow the link to the parent succeeded pair.
      a2fb11fc
    • Fabrice Bellet's avatar
      conncheck: properly select tcp-active discovered candidate · e7b04942
      Fabrice Bellet authored
      Some tcp-active discovered peer-reflexive local candidates may only be
      recognised by their local socket, if they have the same address and same
      port. It may happen when a nat generates an identical mapping from two
      different base local candidates.
      e7b04942
    • Fabrice Bellet's avatar
      stun: set delay in retransmission instead of adding it · ade16ee9
      Fabrice Bellet authored
      We may have situation when stun_timer_refresh is called with a
      significant delay after the current deadline. In the actual situation,
      this delay is just included to the computation of the new deadline of the
      next stun retransmission. We think this may lead to unfair situations,
      where the next deadline may be too short, just to compensate the first
      deadline that was too long.
      
      For example, if a stun request is scheduled with a delay of
      200ms for the 2nd transmission, and 400ms for the 3rd transmission,
      if stun_timer_remainder() is called 300ms after the start of the
      timer, the second delay will last only 300ms, instead of 400ms.
      ade16ee9
  9. 06 May, 2020 8 commits