libdbus calls malloc() after fork(), causing clients to deadlock
Submitted by Primiano Tucci
Assigned to D-Bus Maintainers
Link to original bug (#100843)
Description
TL;DR dbus-autolaunch causes random hangs, if the hosting process uses an allocator which doesn't have multithreads+fork protections. This is because a malloc() happens after fork() in _dbus_close_all() via opendir().
Related chromium bugs: https://crbug.com/695643 https://crbug.com/715658
We had a number of bug reports in chromium where chrome just hangs during startup on Linux. I narrowed down the cause to libdbus. The scenario is the following:
-
somebody starts chrome in a session that doesn't have a dbus running. This is very common in test harness that run via xvfb. doing that triggers dbus-autolaunch.
-
The following happens in the caller process (chrome, in our case): _dbus_transport_open() _dbus_transport_open_autolaunch() _dbus_transport_new_for_autolaunch() _dbus_get_autolaunch_address() _read_subprocess_line_argv() fork() wait() (on the pipe)
-
While in the child process: fork() _dbus_close_all() opendir("/proc/self/fd") malloc()
Calling malloc in a fork()ed process is bad: if the allocator has no atfork handler, doing that will cause random hangs. More details in crbug.com/695643 .
All this seems to come from an optimization in _dbus_close_all() for Linux [1]. You folks seem to have already a fallback there that is doing a: maxfds = sysconf (_SC_OPEN_MAX); for (i = 3; i < maxfds; i++) close (i); Which is good and harmless.
The problem instead is when you opendir(/proc/self/fd). Sadly opendir() implies a malloc. Doing a fork()+malloc() is just asking for troubles. Given that libdbus is a library and not an hemertic executable, IMHO it isn't sensible to expect that hosting process has an allocator with a post-fork handler which handles this corner case.
Can you just always use the fallback and get rid of that /proc/self/fd optimization ? Or achieve that with some other way that doesn't involve a malloc after fork?
For the chrome specific bug we'll be looking into disabling dbus autolunch and reducing our dependencies to that, but in general this feels to me could impact and surprise quite lot of other dbus users (fun fact, the root bug in chrome was caused by somebody calling gconf_client_get_bool).
Regards, Primiano
Version: git master