Skip to content
  • Thomas Haller's avatar
    dns: move ratelimiting and restart from NMDnsManager to NMDnsDnsmasq · 93d5efb4
    Thomas Haller authored
    Note that the only DNS plugin that actually emits the FAILED signal was
    NMDnsDnsmasq. Let's not handle restart, retry and rate-limiting by
    NMDnsManager but by NMDnsDnsmasq itself.
    
    There are three goals here:
    
    (1) we want that when dnsmasq (infrequently) crashes, that we always keep
      retrying. A random crash should be automatically resolved and
      eventually dnsmasq should be working again.
      Note that we anyway cannot fully detect whether something is wrong.
      OK, we detect crashes, but if dnsmasq just gets catatonic, it's just
      as broken. Point being: our ability to detect non-working dnsmasq is limited.
    
    (2) when dnsmasq keeps crashing all the time, then rate limit the retry.
      Of course, at this point there is already something seriously wrong,
      but we shouldn't kill the system by respawning the process without rate
      limiting.
    
    (3) previously, when NMDnsManager noticed that the pluging was broken
      (and rate-limiting kicked in), it would temporarily disable the plugin.
      Basically, that meant to write the real name servers to /etc/resolv.conf
      directly, instead of setting localhost. This partly conflicts with
      (1), because we want to retry and recover automatically. So what good
      is it to notice a problem, resort to plain /etc/resolv.conf for a
      short time, and then run into the issues again? If something is really
      broken, there is no way but to involve the user to investigate and
      fix the issue. Hence, we don't need to concern NMDnsManager with this either.
      The only thing that the manager notices is when the dnsmasq binary is not
      available. In that case, update() fails right away, and the manager falls back
      to configure the name servers in /etc/resolv.conf directly.
    
    Also, change the backoff time from 5 minutes to 1 minute (twice the
    burst interval). There is not particularly strong reason for either
    choice, I think that if the ratelimit kicks in, then something is
    already so wrong that it doesn't matter either way. Anyway, also 60
    seconds is long enough to not kill the machine otherwise.
    93d5efb4