Skip to content

device: wait for carrier even if it wasn't us who brought the device IFF_UP

Lubomir Rintel requested to merge lr/up-carrier-wait into main

The devices generally need to be IFF_UP and wait a little before the carrier detection is reliable. Some devices, actually need to wait more than a little -- r8169 needs up to 5 seconds.

For this reason, we delay startup complete while the carrier is down after we bring the device up. We do this so that we don't reject activations due to carrier down until we're sure it's really down. This works well as long as it's us who brought the device up.

If we're restarting the daemon, the device is going to be already up when we start up the daemon for the second time. There's, however, a slim chance that the device was brought down and up very shortly before the restart and therefore the carrier reporting is still not reliable. As a matter of fact, we bring the devices down and back up on some occassions, such as when enslaving to a team device.

Therefore, the following events in quick succession cause trouble:

  # nmcli con up team-slave-eth0
  [20099.205355] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
  [20099.365641] nm-team: Port device eth0 added
  [20099.370728] r8169 0000:03:00.0 eth0: Link is Down
  [20099.436631] nm-team: Port device eth0 removed
  [20099.463422] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
  [20099.628505] r8169 0000:03:00.0 eth0: Link is Down
  [20099.669425] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
  [20099.833457] r8169 0000:03:00.0 eth0: Link is Down
  [20099.838471] nm-team: Port device eth0 added

The device has been brought down, enslaved and brought up. "Link is Down" indicates carrier not being detected.

  Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7)
  # systemctl restart NetworkManager

Now NM sees the device being up, but carrier down.

  # nmcli con up testeth0
  Error: Connection activation failed: No suitable device found for this connection (...).

Activation failed, because eth0 carrier still appears down.

  # [20102.943464] r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Now it's up, but the party is already over. Shiet.

Let's wait whenever the device reaches unavailable state, whether we bring it up at that point or not.

Fixes-test: @restart_L2_only_lacp

https://bugzilla.redhat.com/show_bug.cgi?id=2092361

Edited by Lubomir Rintel

Merge request reports