deadlock in timerfd read
Under some rare circumstances, it looks like timerfd may cause a deadlock.
This was originally reported at https://jira.automotivelinux.org/browse/SPEC-3957
Backtrace:
Thread 2 (Thread 0x7f53d8545700 (LWP 1961)):
#0 __GI___libc_read (nbytes=8, buf=0x7f53d8544c10, fd=55) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __GI___libc_read (fd=55, buf=0x7f53d8544c10, nbytes=nbytes@entry=8) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007f53d86641d3 in read (__nbytes=8, __buf=<optimized out>, __fd=<optimized out>) at /usr/include/bits/unistd.h:44
#3 impl_timerfd_read (object=<optimized out>, fd=<optimized out>, expirations=<optimized out>) at ../git/spa/plugins/support/system.c:202
#4 0x00007f53d8658816 in on_timeout (source=<optimized out>) at ../git/spa/plugins/support/null-audio-sink.c:223
#5 0x00007f53d86563ab in loop_iterate (object=0x55c2d09a6b48, timeout=<optimized out>) at ../git/spa/plugins/support/loop.c:309
#6 0x00007f53d888f550 in do_loop (user_data=0x55c2d09a69f0) at ../git/src/pipewire/data-loop.c:79
#7 0x00007f53d8677e24 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007f53d878cdcf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7f53d866db80 (LWP 1958)):
#0 __GI___libc_read (nbytes=8, buf=0x7ffe28e425a0, fd=12) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __GI___libc_read (fd=12, buf=0x7ffe28e425a0, nbytes=nbytes@entry=8) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007f53d8663b73 in read (__nbytes=8, __buf=<optimized out>, __fd=<optimized out>) at /usr/include/bits/unistd.h:44
#3 impl_eventfd_read (object=<optimized out>, fd=<optimized out>, count=<optimized out>) at ../git/spa/plugins/support/system.c:232
#4 0x00007f53d86569d4 in loop_invoke (object=0x55c2d09a6b48, func=<optimized out>, seq=4294967295, data=<optimized out>, size=<optimized out>, block=<optimized out>, user_data=<optimized out>) at ../git/spa/plugins/support/loop.c:222
#5 0x00007f53d88a53a0 in pw_impl_link_deactivate (this=0x55c2d0b79ca0) at ../git/src/pipewire/impl-link.c:725
#6 pw_impl_link_deactivate (this=this@entry=0x55c2d0b79ca0) at ../git/src/pipewire/impl-link.c:716
#7 0x00007f53d88acae8 in node_deactivate (this=<optimized out>) at ../git/src/pipewire/impl-node.c:98
#8 pause_node (this=<optimized out>) at ../git/src/pipewire/impl-node.c:196
#9 pause_node (this=0x55c2d0a286e0) at ../git/src/pipewire/impl-node.c:185
#10 0x00007f53d88b4b0d in pw_impl_node_set_state (node=0x55c2d0a286e0, state=PW_NODE_STATE_IDLE) at ../git/src/pipewire/impl-node.c:2093
#11 0x00007f53d8888c30 in ensure_state (node=node@entry=0x55c2d0a286e0, running=running@entry=false) at ../git/src/pipewire/context.c:877
#12 0x00007f53d888c55b in pw_context_recalc_graph (context=<optimized out>, reason=reason@entry=0x7f53d88e5e06 "node deactivate") at ../git/src/pipewire/context.c:1084
#13 0x00007f53d88b4efd in pw_impl_node_set_active (node=0x55c2d0a48880, active=<optimized out>) at ../git/src/pipewire/impl-node.c:2137
#14 0x00007f53d7cb0ded in client_node_demarshal_set_active (object=0x55c2d0b13cf0, msg=<optimized out>) at ../git/src/modules/module-client-node/protocol-native.c:1027
#15 0x00007f53d7d0c165 in process_messages (data=0x55c2d0d4a9c0) at ../git/src/modules/module-protocol-native.c:238
#16 0x00007f53d7d0c421 in connection_data (data=0x55c2d0d4a9c0, fd=<optimized out>, mask=1) at ../git/src/modules/module-protocol-native.c:309
#17 0x00007f53d86563ab in loop_iterate (object=0x55c2d099b678, timeout=<optimized out>) at ../git/spa/plugins/support/loop.c:309
#18 0x00007f53d88a865b in pw_main_loop_run (loop=0x55c2d099b530) at ../git/src/pipewire/main-loop.c:154
#19 0x000055c2d09903bf in main (argc=<optimized out>, argv=<optimized out>) at ../git/src/daemon/pipewire.c:118
Tail from strace log:
1958 timerfd_settime(55, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}, <unfinished ...>
1961 <... epoll_wait resumed>[{EPOLLIN, {u32=3500287432, u64=94295212303816}}, {EPOLLIN, {u32=3500304056, u64=94295212320440}}, {EPOLLIN, {u32=3500320680, u64=94295212337064}}], 32, -1) = 3
1958 <... timerfd_settime resumed>NULL) = 0
1958 write(61, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
1961 read(55, <unfinished ...>
1958 <... write resumed>) = 8
1958 write(11, "\1\0\0\0\0\0\0\0", 8) = 8
1958 read(12, <detached ...>
It looks like thread 1958 cancels the timerfd while it is trying to stop a null sink. At around the same time, thread 1961 calls the on_timeout
callback and tries to read the timerfd. Because the timerfd has been cancelled, read() blocks forever and the data thread stalls.