1. 20 Mar, 2018 17 commits
    • Thomas Haller's avatar
      device: merge IPv4 and IPv6 versions of ip_config_merge_and_apply() (pt1) · d7d8611e
      Thomas Haller authored
      Functions like these are conceptually very similar. Commonly,
      what we want to do for one address family we also want to do
      for the other.
      
      Merge the two functions. This moves the similar parts closer
      to each other and stand beside it. This is only the first part
      of the merge, which is pretty trivial without larger changes
      (to keep the diff simple). More next.
      d7d8611e
    • Thomas Haller's avatar
      device: in nm_device_capture_initial_config() only update config once · 745d60c0
      Thomas Haller authored
      Now that there is no difference between initial capturing of
      the configuration, and a later update_ip_config() call during
      a signal from platform, we only need to make sure that the
      IP config instances are initialized at least once.
      
      In case we are called multiple times, there is nothing to do.
      745d60c0
    • Thomas Haller's avatar
      device: don't capture resolve.conf for initial device config · 454195c0
      Thomas Haller authored
      This was called by via
      
        ...
        - manager:recheck_assume_connection()
          - manager:get_existing_connection()
            - nm_device_capture_initial_config()
              - update_ext_ip_config(initial=TRUE)
      
      and would parse resolv.conf, and try to fill the device IP config
      with nameservers and dns-options.
      
      But why? It would only have effect if NM was started with
      nm_dns_manager_get_resolv_conf_explicit(), but is that really sensible?
      And it would only take effect on devices that have a default route.
      And for what is this information even used?
      
      Let's not do it this way. If we need this information for assuming or
      external sys-iface mode, then it should be explicitly loaded at the
      appropriate moment.
      
      For now, drop it and see what breaks. Then we can fix it properly. If
      it even matters.
      454195c0
    • Thomas Haller's avatar
      device: drop capture_lease_config() during connection assumption · 453f9e51
      Thomas Haller authored
      Drop capture_lease_config(). It was added by commit
      0321073b.
      
      Note that it was only called by
      
        ...
        - manager:recheck_assume_connection()
          - manager:get_existing_connection()
            - nm_device_capture_initial_config()
              - update_ext_ip_config(addr_family=AF_INET, initial=TRUE)
                - capture_lease_config()
      
      It had only effect when the device had IPv4 permanent addresses.
      But then, capture_lease_config() would go on and iterate over
      all connections (sorted by last-connect timestamp). It would
      consider connection candidates that are compatible with the device,
      and try to read the lease information from disk
      
      It's really unclear what this means. For assuming (graceful take over),
      do we need the lease information in the device? I don't think so,
      because we will match an existing connection. The lease information
      shall be read while activating (if necessary).
      
      For external connections, we don't even have a matching connection
      and we always generate a new one. It doesn't seem right to consider
      leases from disk, for a different connection.
      
      Just drop this. It's really ugly. If this causes an issue, it must be
      fixed differently. We want to behave determinstically and well defined.
      I don't even comprehend all the implications of what this had.
      
      Also note that update_ext_ip_config() no longer clears
      priv->dev_ip4_config.
      453f9e51
    • Thomas Haller's avatar
      19e65747
    • Thomas Haller's avatar
      device: also export NMIPxConfig on error in nm_device_set_ipx_config() · 1d88f504
      Thomas Haller authored
      A failure to configure an address family does not mean that the connection
      is going to fail. It depends, for example on ipvx.may-fail.
      
      Always export the NMIPxConfig instance in that case.
      1d88f504
    • Thomas Haller's avatar
      device: cleanup completing wait for linklocal6 · 5fd82a20
      Thomas Haller authored
      linklocal6_complete() had only one caller. The caller would check
      whether the conditions for linklocal6_complete() are satisfied, and
      then call it. Note that linklocal6_complete() would again assert
      that these conditions hold. Don't do this. Just move the check
      inside linklocal6_complete(), and rename to linklocal6_check_complete().
      
      Also, linklocal6_complete() was called by update_ip_config(),
      which was called by nm_device_capture_initial_config() and
      queued_ip6_config_change().
      It doesn't make sense to call linklocal6_complete() during
      nm_device_capture_initial_config(). Move the call to
      queued_ip6_config_change().
      5fd82a20
    • Thomas Haller's avatar
      device: fix check for existing addresses to ignore DADFAILED · 6cdf0b18
      Thomas Haller authored
      Likewise, in ndisc_ra_timeout() we also want to include tentative
      addresses. Looking into priv->ip6_config to determine whether
      an other IP configuration is active, is anyway odd, and likely
      a bug.
      6cdf0b18
    • Thomas Haller's avatar
    • Thomas Haller's avatar
      core: add nm_ip6_config_find_first_address() function and refactor lookup of code · 945339cb
      Thomas Haller authored
      Instead have one particular nm_ip6_config_get_address_first_nontentative() function,
      make it more extendable. Now, we pass a match-type argument, which can control which
      element to search.
      
      This patch has no change in behavior, but it already makes clear, that
      nm_ip6_config_get_address_first_nontentative() was buggy, because it would
      also return addresses that failed DAD.
      945339cb
    • Thomas Haller's avatar
      device: minor cleanups for ipv6ll_handle boolean variable · fe02bb4f
      Thomas Haller authored
      Don't do "if (var == FALSE)" for boolean variables.
      
      Also, make booleans in NMDevicePrivate structure bitfields
      and reorder the fields beside other bitfields. This allows
      a tighter packing of the structure.
      fe02bb4f
    • Thomas Haller's avatar
      device/trivial: rename internal field "nm_ipv6ll" to "ipv6ll_handle" · 041afd2c
      Thomas Haller authored
      The "nm_" prefix should not be used for internal names.
      041afd2c
    • Thomas Haller's avatar
      device: simplify return values for addrconf6_start_with_link_ready() and linklocal6_start() · 0cc605e7
      Thomas Haller authored
      addrconf6_start_with_link_ready() cannot fail. Hence, don' return a boolean
      success value.
      
      linklocal6_start() can only either POSTPONE or succeed right away. Don't return
      a NMActStageReturn value, TRUE/FALSE is enough.
      
      This simplifies the callers that don't have to check for values that never
      come.
      0cc605e7
    • Thomas Haller's avatar
      device: drop and inline trival function linklocal6_cleanup() · f813164d
      Thomas Haller authored
      It doesn't seem to make code clearer, rather, slightly more complex.
      f813164d
    • Thomas Haller's avatar
      fa09e7eb
    • Thomas Haller's avatar
      core: avoid clone of all-connections list for nm_utils_complete_generic() · e17cd1d7
      Thomas Haller authored
      NMSettings exposes a cached list of all connection. We don't need
      to clone it. Note that this is not save against concurrent modification,
      meaning, add/remove of connections in NMSettings will invalidate the
      list.
      
      However, it wasn't save against that previously either, because
      altough we cloned the container (GSList), we didn't take an additional
      reference to the elements.
      
      This is purely a performance optimization, we don't need to clone the
      list. Also, since the original list is of type "NMConnection *const*",
      use that type insistently, instead of dependent API requiring GSList.
      
      IMO, GSList is anyway not a very nice API for many use cases because
      it requires an additional slice allocation for each element. It's
      slower, and often less convenient to use.
      e17cd1d7
    • Thomas Haller's avatar
      device: minor cleanup of nm_device_complete_connection() and add code comment · af97b9a4
      Thomas Haller authored
      Regarding the cleanup: remove the success variable and instead error
      out early.
      af97b9a4
  2. 19 Mar, 2018 2 commits
    • Thomas Haller's avatar
      all: avoid calling g_free on a const pointer with g_clear_pointer() · f0442a47
      Thomas Haller authored
      With g_clear_pointer(pptr, g_free), pptr is cast to a non-const pointer,
      and hence there is no compiler warning about calling g_free() in a const
      pointer. However, it still feels ugly to pass a const pointer to
      g_clear_pointer(). We should either add an explicity cast, or just
      make the pointer non-const.
      
      I guess part of the problem is that C's "const char *" means that the
      string itself is immutable, but also that it cannot be freed. We most
      often want a different semantic: the string itself is immutable after
      being initialized once, but the memory itself can and will be freed.
      Such a notion of immutable C strings cannot be expressed.
      
      For that, just remove the "const" from the declarations. Although we
      don't want to modify the (content of the) string, it's still more a
      mutable string.
      
      Only in _vt_cmd_obj_copy_lnk_vlan(), add an explicity cast but keep the
      type as const. The reason is, that we really want that NMPObject
      instances are immutable (in the sense that they don't modify while
      existing), but that doesn't mean the memory cannot be freed.
      f0442a47
    • Thomas Haller's avatar
  3. 13 Mar, 2018 2 commits
    • Beniamino Galvani's avatar
      dhcp: handle expiry by letting the client continue for some time · 17009ed9
      Beniamino Galvani authored
      Previously we would kill the client when the lease expired and we
      restarted it 3 times at 2 minutes intervals before failing the
      connection. If the client is killed after it received a NACK from the
      server, it doesn't have the chance to delete the lease file and the
      next time it is started it will request the same lease again.
      
      Also, the previous restart logic is a bit convoluted.
      
      Since clients already know how to deal with NACKs, let them continue
      for a grace period after the expiry. When the grace period ends, we
      fail the method and this can either fail the whole connection or keep
      it active depending on the may-fail configuration.
      
      https://bugzilla.gnome.org/show_bug.cgi?id=783391
      17009ed9
    • Thomas Haller's avatar
      core/dbus: rework creating numbered D-Bus export path by putting counter into class · 57ab9fd6
      Thomas Haller authored
      I dislike the static hash table to cache the integer counter for
      numbered paths. Let's instead cache the counter at the class instance
      itself -- since the class contains the information how the export
      path should be exported.
      
      However, we cannot use a plain integer field inside the class structure,
      because the class is copied between derived classes. For example,
      NMDeviceEthernet and NMDeviceBridge both get a copy of the NMDeviceClass
      instance. Hence, the class doesn't contain the counter directly, but
      a pointer to one counter that can be shared between sibling classes.
      57ab9fd6
  4. 12 Mar, 2018 1 commit
    • Thomas Haller's avatar
      core/dbus: rework D-Bus implementation to use lower layer GDBusConnection API · 297d4985
      Thomas Haller authored
      Previously, we used the generated GDBusInterfaceSkeleton types and glued
      them via the NMExportedObject base class to our NM types. We also used
      GDBusObjectManagerServer.
      
      Don't do that anymore. The resulting code was more complicated despite (or
      because?) using generated classes. It was hard to understand, complex, had
      ordering-issues, and had a runtime and memory overhead.
      
      This patch refactors this entirely and uses the lower layer API GDBusConnection
      directly. It replaces the generated code, GDBusInterfaceSkeleton, and
      GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
      and static descriptor instances of type GDBusInterfaceInfo.
      
      This adds a net plus of more then 1300 lines of hand written code. I claim
      that this implementation is easier to understand. Note that previously we
      also required extensive and complex glue code to bind our objects to the
      generated skeleton objects. Instead, now glue our objects directly to
      GDBusConnection. The result is more immediate and gets rid of layers of
      code in between.
      Now that the D-Bus glue us more under our control, we can address issus and
      bottlenecks better, instead of adding code to bend the generated skeletons
      to our needs.
      
      Note that the current implementation now only supports one D-Bus connection.
      That was effectively the case already, although there were places (and still are)
      where the code pretends it could also support connections from a private socket.
      We dropped private socket support mainly because it was unused, untested and
      buggy, but also because GDBusObjectManagerServer could not export the same
      objects on multiple connections. Now, it would be rather straight forward to
      fix that and re-introduce ObjectManager on each private connection. But this
      commit doesn't do that yet, and the new code intentionally supports only one
      D-Bus connection.
      Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
      succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
      connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
      for the moment. It could be easily extended later, for example with polling whether the
      system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
      supported either -- just like before.
      
      Note how NMDBusManager now implements the ObjectManager D-Bus interface
      directly.
      
      Also, this fixes race issues in the server, by no longer delaying
      PropertiesChanged signals. NMExportedObject would collect changed
      properties and send the signal out in idle_emit_properties_changed()
      on idle. This messes up the ordering of change events w.r.t. other
      signals and events on the bus. Note that not only NMExportedObject
      messed up the ordering. Also the generated code would hook into
      notify() and process change events in and idle handle, exhibiting the
      same ordering issue too.
      No longer do that. PropertiesChanged signals will be sent right away
      by hooking into dispatch_properties_changed(). This means, changing
      a property in quick succession will no longer be combined and is
      guaranteed to emit signals for each individual state. Quite possibly
      we emit now more PropertiesChanged signals then before.
      However, we are now able to group a set of changes by using standard
      g_object_freeze_notify()/g_object_thaw_notify(). We probably should
      make more use of that.
      
      Also, now that our signals are all handled in the right order, we
      might find places where we still emit them in the wrong order. But that
      is then due to the order in which our GObjects emit signals, not due
      to an ill behavior of the D-Bus glue. Possibly we need to identify
      such ordering issues and fix them.
      
      Numbers (for contrib/rpm --without debug on x86_64):
      
      - the patch changes the code size of NetworkManager by
        - 2809360 bytes
        + 2537528 bytes (-9.7%)
      
      - Runtime measurements are harder because there is a large variance
        during testing. In other words, the numbers are not reproducible.
        Currently, the implementation performs no caching of GVariants at all,
        but it would be rather simple to add it, if that turns out to be
        useful.
        Anyway, without strong claim, it seems that the new form tends to
        perform slightly better. That would be no surprise.
      
        $ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .;  done)
        - real    1m39.355s
        + real    1m37.432s
      
        $ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
        - real    0m26.843s
        + real    0m25.281s
      
      - Regarding RSS size, just looking at the processes in similar
        conditions, doesn't give a large difference. On my system they
        consume about 19MB RSS. It seems that the new version has a
        slightly smaller RSS size.
        - 19356 RSS
        + 18660 RSS
      297d4985
  5. 08 Mar, 2018 2 commits
  6. 28 Feb, 2018 1 commit
  7. 21 Feb, 2018 3 commits
    • Thomas Haller's avatar
      device: don't set invalid ip-iface · 78ca2a70
      Thomas Haller authored
      Now that every call to nm_device_set_ip_iface() and nm_device_set_ip_ifindex()
      is checked, and setting an interface that does not exist causes the device
      state to fail, we no longer need to allow setting an ip-iface if we are
      unable to retrieve the ip-ifindex.
      78ca2a70
    • Thomas Haller's avatar
      device: refactor nm_device_set_ip_ifindex() and set_ip_iface() · ab457830
      Thomas Haller authored
      - don't even bother to look into the platform cache, but use
        if_indextoname() / if_nametoindex(). In most cases, we obtained
        the ifindex/ifname not from the platform cache in the first
        place. Hence, there is a race, where the interface might not
        exist.
        However, try to process events of the platform cache, hoping
        that the cache contains an interface for the given ifindex/ifname.
      
      - let set_ip_ifindex() and set_ip_iface() both return a boolean
        value to indicate whether a ip-interface is set or not. That is,
        whether we have a positive ip_ifindex. That seems more interesting
        information, then to return whether anything changed.
      
      - as before, set_ip_ifindex() can only clear an ifindex/ifname,
        or error out without doing anything. That is different from
        set_ip_iface(), which will also set an ifname if no ifindex
        can be resolved. That is curreently ugly, because then ip-ifindex
        and ip-iface don't agree. That shall be improved in the future
        by:
        - trying to set an interface that cannot be resolved shall
          lead to a disconnect in any case.
        - we shall make less use of the ip-iface and rely more on the
          ifindex.
      ab457830
    • Thomas Haller's avatar
      iface-helper: fix non-reentrant call to platform for failed IPv6 DAD · ad21d542
      Thomas Haller authored
      Platform invokes change events while reading netlink events. However,
      platform code is not re-entrant and calling into platform again is not
      allowed (aside operations that do not process the netlink socket, like
      lookup of the platform cache).
      
      That basically means, we have to always process events in an idle
      handler. That is not a too strong limitation, because we anyway don't
      know the call context in which the platform event is emitted and we
      should avoid unguarded recursive calls into platform.
      
      Otherwise, we get hit an assertion/crash in nm-iface-helper:
      
           1  raise()
           2  abort()
           3  g_assertion_message()
           4  g_assertion_message_expr()
           5  do_delete_object()
           6  ip6_address_delete()
       >>> 7  nm_platform_ip6_address_delete()
           8  nm_platform_ip6_address_sync()
           9  nm_ip6_config_commit()
           10 ndisc_config_changed()
           11 ffi_call_unix64()
           12 ffi_call()
           13 g_cclosure_marshal_generic_va()
           14 _g_closure_invoke_va()
           15 g_signal_emit_valist()
           16 g_signal_emit()
       >>> 17 nm_ndisc_dad_failed()
           18 ffi_call_unix64()
           19 ffi_call()
           20 g_cclosure_marshal_generic()
           21 g_closure_invoke()
           22 signal_emit_unlocked_R()
           23 g_signal_emit_valist()
           24 g_signal_emit()
       >>> 25 nm_platform_cache_update_emit_signal()
           26 event_handler_recvmsgs()
           27 event_handler_read_netlink()
           28 delayed_action_handle_one()
           29 delayed_action_handle_all()
           30 do_delete_object()
           31 ip6_address_delete()
           32 nm_platform_ip6_address_delete()
           33 nm_platform_ip6_address_sync()
       >>> 34 nm_ip6_config_commit()
           35 ndisc_config_changed()
           36 ffi_call_unix64()
           37 ffi_call()
           38 g_cclosure_marshal_generic_va()
           39 _g_closure_invoke_va()
           40 g_signal_emit_valist()
           41 g_signal_emit()
           42 check_timestamps()
           43 receive_ra()
           44 ndp_call_eventfd_handler()
           45 ndp_callall_eventfd_handler()
           46 event_ready()
           47 g_main_context_dispatch()
           48 g_main_context_iterate.isra.22()
           49 g_main_loop_run()
       >>> 50 main()
      
      NMPlatform already has a check to assert against recursive calls
      in delayed_action_handle_all():
      
          g_return_val_if_fail (priv->delayed_action.is_handling == 0, FALSE);
      
          priv->delayed_action.is_handling++;
          ...
          priv->delayed_action.is_handling--;
      
      Fixes: f85728ec
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1546656
      ad21d542
  8. 20 Feb, 2018 3 commits
  9. 15 Feb, 2018 4 commits
  10. 09 Feb, 2018 5 commits
    • Thomas Haller's avatar
      core: distinguish between IFA_F_SECONDARY and IFA_F_TEMPORARY · 3e9e51f1
      Thomas Haller authored
      While the numerical values of IFA_F_SECONDARY and IFA_F_TEMPORARY
      are identical, their meaning is not.
      
      IFA_F_SECONDARY is only relevant for IPv4 addresses, while
      IFA_F_TEMPORARY is only relevant for IPv6 addresses.
      
      IFA_F_TEMPORARY is automatically set by kernel for the addresses
      that it generates as part of IFA_F_MANAGETEMPADDR. It cannot be
      actively set by user-space.
      
      IFA_F_SECONDARY is automatically set by kernel depending on the order
      in which the addresses for the same subnet are added.
      
      This essentially reverts 8b4f1192 (core: avoid IFA_F_TEMPORARY alias for
      IFA_F_SECONDARY).
      3e9e51f1
    • Thomas Haller's avatar
      device: fix IPv6 DAD to re-check whether address really failed DAD · 6d8a6365
      Thomas Haller authored
      In device_ipx_changed() we remember the addresses for which it appears
      that DAD failed. Later, on an idle handler, we process them during
      queued_ip6_config_change().
      
      Note that nm_plaform_ip6_address_sync() might very well decide to remove
      some or all addresses and re-add them immidiately later. It might do so,
      to get the address priority/ordering right. At that point, we already
      emit platform signals that the device disappeared, and track them in
      dad6_failed_addrs.
      
      Hence, later during queued_ip6_config_change() we must check again
      whether the address is really not there and not still doing DAD.
      Otherwise, we wrongly claim that DAD failed and remove the address,
      generate a new one, and the same issue might happen again.
      6d8a6365
    • Thomas Haller's avatar
      device: don't check addr-source for addresses that failed IPv6 DAD · fc7448b3
      Thomas Haller authored
      dad6_failed_addrs is populated with addresses from the platform cache.
      Inside the cache, all addresses have addr_source NM_IP_CONFIG_SOURCE_KERNEL,
      because addr_source property for addresses is only a property that is
      used NetworkManager internally.
      fc7448b3
    • Thomas Haller's avatar
      device: ignore temporary addresses for IPv6 DAD · 7ddd83e8
      Thomas Haller authored
      Temporary addresses are entirely managed by kernel, via the mngtempaddr flag of the
      primay address. No need to consider them for DAD.
      7ddd83e8
    • Thomas Haller's avatar
      device: don't clone NMPlatformIP6Address for dad6_failed_addrs · 95c94ff0
      Thomas Haller authored
      NMPObjects are never modified after being put into the cache.
      Hence, it is safe and encouraged to just keep a reference to them,
      instead of cloning them.
      
      Interestingly, NMPlatform's change signals have a platform_object
      pointer, which is not the pointer to the NMPObjects itself, but
      down-cast to the NMPlatformObject instance. It does so, because commonly
      callers want to have a pointer to the NMPlatformObject instance, instead
      of the outer NMPObjects. However, NMP_OBJECT_UP_CAST() is guaranteed
      to work one would expect.
      95c94ff0