dbus-daemon crashes at shutdown when there are monitoring connections to clear
We're getting a core dump from dbus-daemon on shutdown whenever there are monitoring connections to clear:
Core was generated by `/usr/bin/dbus-daemon --fork --print-pid 6 --print-address 8 --session'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fbaf3f009c7 in _dbus_list_unlink (list=0x55ee351e7800, link=link@entry=0x55ee351f62a0) at ../../dbus/dbus-list.c:510
510 link->next->prev = link->prev;
(gdb) bt
#0 0x00007fbaf3f009c7 in _dbus_list_unlink (list=0x55ee351e7800, link=link@entry=0x55ee351f62a0) at ../../dbus/dbus-list.c:510
#1 0x00007fbaf3f00a29 in _dbus_list_remove_link (list=<optimized out>, link=0x55ee351f62a0) at ../../dbus/dbus-list.c:530
#2 0x000055ee34f1eb35 in bus_connection_disconnected (connection=connection@entry=0x55ee351f31f0) at ../../bus/connection.c:305
#3 0x000055ee34f1ec8c in bus_connections_unref (connections=0x55ee351e77b0) at ../../bus/connection.c:555
#4 0x000055ee34f17303 in bus_context_unref (context=0x55ee351e73f0) at ../../bus/bus.c:1131
#5 0x000055ee34f13be4 in main (argc=<optimized out>, argv=<optimized out>) at ../../bus/main.c:687
(gdb) p link
$1 = (DBusList *) 0x55ee351f62a0
(gdb) p *link
$2 = {prev = 0x55ee351eb210, next = 0x0, data = 0x55ee3528c1e0}
(gdb) p *list
$16 = (DBusList *) 0x0
(gdb) fr 2
#2 0x000055ee34f1eb35 in bus_connection_disconnected (connection=connection@entry=0x55ee351f31f0) at ../../bus/connection.c:305
305 _dbus_list_remove_link (&d->connections->monitors, d->link_in_monitors);
(gdb) p *d->connections
$17 = {refcount = 0, completed = 0x55ee351f6198, n_completed = 35, incomplete = 0x0, n_incomplete = 0, context = 0x55ee351e73f0, completed_by_user = 0x55ee351eec80, expire_timeout = 0x55ee351f92d0, stamp = 5243451,
pending_replies = 0x55ee351f90b0, monitors = 0x0, monitor_matchmaker = 0x55ee351fd210, total_match_rules = 298, peak_match_rules = 625, peak_match_rules_per_conn = 172, total_bus_names = 79, peak_bus_names = 137,
peak_bus_names_per_conn = 16}
(gdb) p *d
$18 = {connections = 0x55ee351e77b0, link_in_connection_list = 0x55ee351f6198, connection = 0x55ee351f31f0, services_owned = 0x0, n_services_owned = 0, match_rules = 0x0, n_match_rules = 0, name = 0x55ee351f3e10 ":1.23",
transaction_messages = 0x0, oom_message = 0x55ee351f2020, oom_preallocated = 0x55ee351eaea0, policy = 0x55ee351f3b00, cached_loginfo_string = 0x55ee351f07c0 "uid=20709 pid=12507 comm=\"/usr/bin/dbus-monitor --session \"",
selinux_id = 0x0, apparmor_confinement = 0x0, connection_tv_sec = 196347, connection_tv_usec = 546267, stamp = 5243445, peak_match_rules = 1, peak_bus_names = 1, n_pending_unix_fds = 0, pending_unix_fds_timeout = 0x0,
link_in_monitors = 0x55ee351f62a0}
(gdb) p *d->link_in_monitors
$19 = {prev = 0x55ee351eb210, next = 0x0, data = 0x55ee3528c1e0}
We're actually using dbus-1.10.24-13.el7_6.x86_64, but I checked here in the master branch and found that the bug still exists. All line numbers below are from the master branch.
This is happening because bus_connections_unref()
drops all monitors first by calling _dbus_list_clear()
(line 547 below) before then calling bus_connection_disconnected()
(line 558) which attempts to remove links from monitors:
bus/connection.c
546: /* drop all monitors */
547- _dbus_list_clear (&connections->monitors);
548-
549- /* drop all real connections */
550- while (connections->completed != NULL)
551- {
552- DBusConnection *connection;
553-
554- connection = connections->completed->data;
555-
556- dbus_connection_ref (connection);
557- dbus_connection_close (connection);
558- bus_connection_disconnected (connection);
559- dbus_connection_unref (connection);
560- }
When bus_connection_disconnected()
is called, d->connections->monitors
is NULL but d->links_in_monitors
is still pointing to an item in the list:
(gdb) p d->connections->monitors
$22 = (DBusList *) 0x0
(gdb) p d->link_in_monitors
$23 = (DBusList *) 0x55ee351f62a0
As a result of d->link_in_monitors
containing a pointer, bus_connection_disconnected()
calls _dbus_list_remove_link()
:
301 if (d->link_in_monitors != NULL)
302 {
...
308 _dbus_list_remove_link (&d->connections->monitors, d->link_in_monitors);
This invokes _dbus_list_unlink()
which presumes that next
can never be NULL:
dbus/dbus-list.c
500:_dbus_list_unlink (DBusList **list,
501- DBusList *link)
502-{
503- if (link->next == link)
504- {
505- /* one-element list */
506- *list = NULL;
507- }
508- else
509- {
510- link->prev->next = link->next;
511- link->next->prev = link->prev;
512-
513- if (*list == link)
514- *list = link->next;
515- }
Unfortunately, _dbus_list_clear()
leaves the linked list in a state where the next
pointer can be NULL:
dbus/dbus-list.c
543:_dbus_list_clear (DBusList **list)
544-{
545- DBusList *link;
546-
547- link = *list;
548- while (link != NULL)
549- {
550- DBusList *next = _dbus_list_get_next_link (list, link);
551-
552- free_link (link);
553-
554- link = next;
555- }
556-
557- *list = NULL;
558-}
... because of the use of the _dbus_list_get_next_link()
macro, defined as:
dbus/dbus-list.h
119:#define _dbus_list_get_next_link(list, link) ((link)->next == *(list) ? NULL : (link)->next)
And so we get an attempt to de-reference a NULL pointer:
(gdb) fr 0
#0 0x00007fbaf3f009c7 in _dbus_list_unlink (list=0x55ee351e7800, link=link@entry=0x55ee351f62a0) at ../../dbus/dbus-list.c:510
510 link->next->prev = link->prev;
(gdb) p link->next
$20 = (DBusList *) 0x0
(gdb) disassem
Dump of assembler code for function _dbus_list_unlink:
<snip>
=> 0x00007fbaf3f009c7 <+23>: mov %rdx,(%rax)
(gdb) info reg
rax 0x0 0
<snip>
... resulting in a core dump.
I suggest _dbus_list_clear()
should be fixed so that it can't leave the next
pointer in a NULL state? It's used all over the place, I'm wondering what other issues this code could be seeding. That and/or bus_connection_disconnected()
shouldn't leave d->link_in_monitors
dangling when it drops all monitors.