1. 09 Jun, 2022 8 commits
    • Lubomir Rintel's avatar
      device: release slaves when an external device is going managed · 59a244bd
      Lubomir Rintel authored
      When we're deactivating an externally created device that has a master
      because we're activating a connection on it, actually remove the device
      from the master. Otherwise unpleasant things happen:
      
        active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
        device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
        device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
        device (dummy0): disconnecting for new activation request.
        device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
        device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
      
      Note the "no-config" above. We'set priv->master = NULL, but didn't
      communicate the change to the platform. I believe this is not good.
      This patch changes that.
      
        device (br0): bridge port dummy0 was detached
        device (dummy0): released from master device br0
        active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
        device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
        device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
        device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
      
        platform: (dummy0) emit signal link-changed changed: 102: dummy0
            <NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
             addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
      
      Now the platform sent us a new link, the "master" property is still set.
      
        device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
        device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
        device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
        config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
        active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
        active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
        device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
        device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
      
        l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
        device (br0): master: add one slave 0a458361f9fed8f5/dummy0
      
      Amidst the new activation we're processing the netlink message we got.
      We set priv->master back, effectively nullifying the release above. Sad.
      
        device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
        device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
        active-connection[0x55ed7ba78400]: set state activating (was unknown)
        manager: NetworkManager state is now CONNECTING
        active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
        device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
        device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
        device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
        platform: (dummy0) link: releasing 102 from master 'br0' (101)
        device (br0): detached bridge port dummy0
      
      Now things go south. The stage1 cleans the device up, removing it from
      the master and the device itself decides it should deactivate itself
      because it lots its master regardless of the fact that it should not
      have one and it's in fact an unwanted carryover from previous activation.
      I believe this is also wrong.
      
        device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
        device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
        device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
        device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
        device[0a458361f9fed8f5] (dummy0): queue-state[deactivating, reason:connection-assumed, id:298]: queue state change
        device[0a458361f9fed8f5] (dummy0): activation-stage: synchronously invoke activate_stage2_device_config
        device (dummy0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
      
      Now things are really weird. We synchronously go to config, effectively
      overriding the queued deactivation. We've really messed up.
      59a244bd
    • Lubomir Rintel's avatar
      device: only deactivate when the master we've enslaved to goes away · 9dcfb3d0
      Lubomir Rintel authored
      Sometimes weird things happen.
      
      Let dummy0 be an externally created device that has a master. We decide
      to activate a connection that has no master on it:
      
        active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
        device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
        device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
        device (dummy0): disconnecting for new activation request.
        device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
        device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
      
      Note the "no-config" above. We'set priv->master = NULL, but didn't
      communicate the change to the platform. I believe this is not good.
      
        device (br0): bridge port dummy0 was detached
        device (dummy0): released from master device br0
        active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
        device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
        device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
        device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
      
        platform: (dummy0) emit signal link-changed changed: 102: dummy0
            <NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
             addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
      
      Now the platform sent us a new link, the "master" property is still set.
      
        device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
        device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
        device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
        config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
        active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
        active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
        device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
        device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
      
        l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
        device (br0): master: add one slave 0a458361f9fed8f5/dummy0
      
      Amidst the new activation we're processing the netlink message we got.
      We set priv->master back, effectively nullifying the release above.
      
        device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
        device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
        active-connection[0x55ed7ba78400]: set state activating (was unknown)
        manager: NetworkManager state is now CONNECTING
        active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
        device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
        device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
        device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
        platform: (dummy0) link: releasing 102 from master 'br0' (101)
        device (br0): detached bridge port dummy0
      
      Now stage1 cleans the device up, removing it from the master.
      
        device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
        device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
        device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
        device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
      
      We decide to deal with this by enqueuing a deactivation. That is not
      great -- we shouldn't even have had this master!
      
      This patch takes the deactivation path only if we were willingly
      enslaved to the master in question.
      9dcfb3d0
    • Lubomir Rintel's avatar
      device: stop checking the IP configuration state when cancelling activation · a446edf7
      Lubomir Rintel authored
      The @bond_mode_8023ad test has been seen failing, with a log like this:
      
        <debug> [...3.0484] device[...] (eth1): Activation: connection 'bond0.0' master deactivated
        <debug> [...3.0484] device[...] (eth1): add_pending_action (2): 'queued-state-change-deactivating'
        <debug> [...3.0484] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: queue state change
      
      What happened is that eth1 has been activating. It was already enslaved
      to a bond and was in an ip-config state when the bond was removed.
      A change to "deactivating" state has been enqueued. But then this
      happened:
      
        <trace> [...3.0942] device[...] (eth1): ip4: check-state: state done => done, is_failed=0, is_pending=0,
                            is_started=0 temp_na=0, may-fail-4=1, may-fail-6=1; disabled4; manualip4=done; ignore6 manualip6=done
        <trace> [...3.0942] device[...] (eth1): ip: check-state: (combined) state pending => done
        <debug> [...3.0943] device[...] (eth1): ip: set (combined) state done (was pending, reason: check-ip-state)
        <info>  [...3.0943] device (eth1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
        <debug> [...3.0943] device[...] (eth1): add_pending_action (3): 'in-state-change'
        <debug> [...3.0943] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: clear queued state change
      
      The IP config succeeded and the queued "deactivating" change was
      overriden by the IP4 check result, prompting a change to "ip-check".
      With the master still missing. Not good.
      
      Let's terminate the appempts to check the IP state when we cancel the
      activation, so that it doesn't override the enqueued state change.
      
      Fixes-test: @bond_mode_8023ad
      
      https://bugzilla.redhat.com/show_bug.cgi?id=2080928
      !1245
      a446edf7
    • Thomas Haller's avatar
      dispatcher: log duration of dispatcher call · 431d139d
      Thomas Haller authored
      Yes, we anyway log the timestamps for every log message. So one could
      always calculate the offset. However, when you read a logfile, it can be
      cumbersome to stop looking at where you currently are to find the
      start/end of a call. For convenience, log the duration explicitly.
      
      !1251
      431d139d
    • Thomas Haller's avatar
    • Thomas Haller's avatar
      std-aux: cleanup NM_CMP_*() macros · 921af527
      Thomas Haller authored
      - add code comments explaining some things.
      
      - for NM_CMP_FIELD*() variants have a corresponding NM_CMP_DIRECT*()
        macro and use it (aside the "memcmp" variants, which don't translate
        directly).
      921af527
    • Beniamino Galvani's avatar
      device: fix memory leak · f69a1cc8
      Beniamino Galvani authored
      l3cd instances must be removed from the old l3cfg before calling
      _cleanup_ip_pre(). Otherwise, _cleanup_ip_pre() unregisters them from
      the device, and later _dev_l3_register_l3cds(self, l3cfg_old, FALSE,
      FALSE) does nothing because the device doesn't have any l3cd.
      
      Previously the l3cds would linger in the l3cfg, keeping a reference to
      it and causing a memory leak; the leak was not detected by valgrind
      because the l3cfg was still referenced by the NMNetns.
      
      Fixes: 58287cbc ('core: rework IP configuration in NetworkManager using layer 3 configuration')
      Fixes-test: @stable_mem_consumption2
      
      https://bugzilla.redhat.com/show_bug.cgi?id=2083453
      
      !1252
      f69a1cc8
    • Thomas Haller's avatar
      l3cfg: fix comparing "has-dns-priority" flag in nm_l3_config_data_cmp_full() · 8e86cfb8
      Thomas Haller authored
      Fixes: cb292445 ('core: support compare flags in nm_l3_config_data_cmp_full()')
      8e86cfb8
  2. 08 Jun, 2022 1 commit
  3. 07 Jun, 2022 3 commits
  4. 03 Jun, 2022 2 commits
  5. 02 Jun, 2022 3 commits
  6. 01 Jun, 2022 3 commits
  7. 31 May, 2022 20 commits