libnm segfaults randomly on simple nmcli commands using NetworkManager 1.22 or newer
Hi all, I'm currently investigating an issue that I'm seeing using a custom Linux distro based on Yocto 3.2 running on embedded systems based on NXP's i.MX 8 family (more specifically, i.MX8X, i.MX8M Nano and i.MX8M Mini). Yocto 3.2 uses the following recipes for NetworkManager and glib (both are patched versions of official releases):
- NetworkManager 1.22.14: https://github.com/openembedded/meta-openembedded/blob/gatesgarth/meta-networking/recipes-connectivity/networkmanager/networkmanager_1.22.14.bb
- glib 2.64.5: https://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-core/glib-2.0/glib-2.0_2.64.5.bb?h=gatesgarth
The issue is that, 99% of the time, nmcli commands issued on the device work just fine, but sometimes the command will print its expected output and generate a segfault. For example:
root@ccimx8mn-dvk:~# nmcli dev wifi list
NAME UUID TYPE DEVICE
Cisco_2600_Open_Open 22675d12-a166-4e30-93b8-05e136a99f95 wifi wlan0
eth0 dce47dc1-937e-360e-bcb2-a5745f1e170b ethernet eth0
Cisco_2600_EAP_TLS 845b54d4-2974-4dab-9162-9be2e1f58fc6 wifi --
Cisco_2600_WEP64_HEX1 790a600d-ab1c-4f60-ad78-0c72199ee996 wifi --
Cisco_2600_WPA2_AES 7df6ba97-3fb9-425d-b5d0-1e7af8b38517 wifi --
cellular c7b0df65-2c06-3b5d-82cd-5c09be5fecf6 gsm --
eth1 7296aec4-0cf2-3172-a2a8-79bb745d5a9a ethernet --
wlan0 3153f68e-dd89-3443-ab0a-88cae2fc88fc wifi --
Segmentation fault
root@ccimx8mn-dvk:~# echo $?
139
The bug seems "harmless", as nmcli seems to be doing what it's expected to do, but since the segfault causes the process to return an error code, this creates false negatives in automated tests that involve network interfaces.
Some notes about the issue:
- The nmcli command in question doesn't seem to matter, as I've seen the segfault happen when running "nmcli dev list", "nmcli con list" and just "nmcli".
- When running any of these nmcli commands continuously every second, the segfault can happen in less then a minute, after a couple of minutes, after 10, after 30... There is no apparent pattern to when the segfault can occur. The bug still happens if you wait a long time between nmcli calls, or if you run other unrelated commands between them.
- After a lot of tests on my end, I noticed that the issue is only present in NetworkManager versions 1.22 and newer. Specifically, the "guilty" commit seems to be this one between 1.21 and 1.22-rc1 which refactors DBus caching in libnm: ce0e898f . When building an intermediate revision of NetworkManager between 1.21 and 1.22-rc1, the issue goes away entirely once this specific commit is reverted.
- NetworkManager versions that don't have this issue (< 1.22) sometimes print warnings like this, with a reproducibility similar to the segfault's (seemingly random), which makes me believe that there is an underlying issue that causes warnings in these versions and the segfaults in newer ones. Note that the warning messages seem to be related to the commit mentioned in the previous bullet point. These warnings are also mentioned here https://bugzilla.redhat.com/show_bug.cgi?id=1461572:
(process:7112): GLib-GIO-WARNING **: 17:34:10.739: ../glib-2.64.5/gio/gdbusobjectmanagerclient.c:1646: Processing InterfaceRemoved signal for path /org/freedesktop/NetworkManager/DHCP4Config/5 but no object proxy exists
(process:7112): GLib-GIO-WARNING **: 17:34:10.747: ../glib-2.64.5/gio/gdbusobjectmanagerclient.c:1646: Processing InterfaceRemoved signal for path /org/freedesktop/NetworkManager/IP6Config/9 but no object proxy exists
- Strangely enough, the issue isn't reproducible when using the same Yocto 3.2 distro on 32-bit platforms, only on 64-bit ones. It's worth noting that the distro packages aren't 100% the same in either case, but the only noticeable difference for now is the architecture.
Is there a reasonable explanation for this behavior? Could it be something else in my Yocto distro that is causing these random segfaults? Is the DBus caching commit responsible for this, or is it simply exposing something that was there to begin with?
Let me know if you need any more information on the NetworkManager configuration used in Yocto or on other related packages. Many thanks in advance, Gabriel