systemd and dbus sometimes disagree about whether the system bus connection has authenticated
This is related to, possibly the same as, https://github.com/systemd/systemd/issues/15316 and the LP bug is https://bugs.launchpad.net/ubuntu/+source/dbus/+bug/1871538. I'm not confident we aren't observing two issues though. Here I'll talk about "Connection has not authenticated soon enough, closing it" which we're most recently observing with Azure instances running Ubuntu 21.04.
In these logs I'm using systemd 248.3-1ubuntu1 and dbus-daemon 1.12.20-2ubuntu1 compiled with verbose messages and running with DBUS_VERBOSE=1
, but we managed to reproduce without. Actually adding DBUS_VERBOSE=1
seems to make the issue a bit easier to hit, so maybe it's a race which is easier to lose under load.
What happens is that "sometimes" on boot all Type=dbus services have failed to start (resp. the other case: when dist-upgrading, all running Type=dbus service units lose their bus connection and get torn down).
In the logs when this has happened I can observe the following:
laney@dev> egrep "(bus-api-system|soon enough|Connection terminated)" journal-sorted.log
Jun 17 09:23:59.399376 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state UNSET → OPENING
Jun 17 09:23:59.399391 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: sd-bus: starting bus bus-api-system by connecting to /run/dbus/system_bus_socket...
Jun 17 09:23:59.399429 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state OPENING → AUTHENTICATING
Jun 17 09:23:59.808194 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state AUTHENTICATING → HELLO
Jun 17 09:24:09.255136 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state HELLO → RUNNING
Jun 17 09:25:06.580618 alan-hirsute-base-jzwgzzozgdyqauuqajbr dbus-daemon[610]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 53691ms)
Jun 17 09:25:09.221680 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state RUNNING → CLOSING
Jun 17 09:25:09.221741 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: polkit.service: Unexpected error response from GetNameOwner(): Connection terminated
Jun 17 09:25:09.221799 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state CLOSING → CLOSED
# here it starts working AFAICS
Jun 17 09:26:14.221908 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state UNSET → OPENING
Jun 17 09:26:14.221921 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: sd-bus: starting bus bus-api-system by connecting to /run/dbus/system_bus_socket...
Jun 17 09:26:14.221978 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state OPENING → AUTHENTICATING
Jun 17 09:26:14.229261 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state AUTHENTICATING → HELLO
Jun 17 09:26:14.245763 alan-hirsute-base-jzwgzzozgdyqauuqajbr systemd[1]: Bus bus-api-system: changing state HELLO → RUNNING
I've attached the full log here. I can't see from the super verbose dbus-daemon logging whether systemd is wrong when it thinks it authenticated or not. Maybe a maintainer could peruse and advise if this seems to be systemd or dbus having a problem?
(I've had to re-sort the attached log by the way; while dbus-daemon and systemd's messages were correctly ordered with respect to themselves, they were interleaved wrong in the journal output, although it seems to have correct timestamps, so not sure what that's about really.)