qmi-proxy lockup, transaction timeouts
Hi, I have observed some lockups with qmi-proxy and I think I narrowed down the issue.
Replicatable setup:
- Start an instance of qmi-proxy.
- Run multiple concurrent qmicli commands with -p for example: "qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop & qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop & qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop" The following or similar error should appear: "error: couldn't create client for the 'dms' service: CID allocation failed in the CTL client: Transaction timed out" If the error does not appear kill the running qmi-proxy and repeat the steps.
Probable cause: There is a race condition that allows the same qmi-devices to be open multiple times if proxy requests are made in quick succession. This race condition is known and handled: https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/blob/main/src/libqmi-glib/qmi-proxy.c#L371
Sadly this handling does not seem to be enough, I have observed qmi responses to arrive on the transiently opened device. This causes breakdown of the transaction handling (all requests get a timeout) and eventual hangs of qmi-proxy where it does not respond to exit signals anymore (I am not sure how, since I am not familiar with the libqmi code).
I have made an strace of qmi-proxy and can attach a full version for further debugging (does gitlab have gists?), but the behaviour described above is visible: Abbreviated trace (...) first number is line number second PID:
1297 5436 openat(AT_FDCWD, "/dev/cdc-wdm0", O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 8
1433 5436 openat(AT_FDCWD, "/dev/cdc-wdm0", O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 10
1577 5436 openat(AT_FDCWD, "/dev/cdc-wdm0", O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 12
1676 5436 write(8, "\1\v\0\0\0\0\0\1!\0\0\0", 12) = 12
1697 5436 write(8, "\1\v\0\0\0\0\0\2!\0\0\0", 12) = 12
1760 5436 read(8, "\1\204\0\200\0\0\1\1!\0y" ... , 2048) = 133
1784 5436 read(12, "\1\204\0\200\0\0\1\2!\0y" ... , 2048) = 133
1830 5436 write(8, "\1\v\0\0\0\0\0\3!\0\0\0", 12) = 12
1850 5436 write(8, "\1\v\0\0\0\0\0\4!\0\0\0", 12) = 12
1880 5436 read(8, "\1\204\0\200\0\0\1\3!\0y" ... , 2048) = 133
1890 5436 write(1, ... "[30 Apr 2024, 12:05:52] [Debug] [/dev/cdc-wdm0] No transaction matched in received message\n, 4096) = 4096
In my opinion this is a fairly severe bug since it defeats the entire existence of qmi-proxy, it cannot be reliably used to concurrently use a qmi device. A proper fix would probably prevent the same device being open multiple times in the first place.