decodebin2 sometimes deadlocks on EOS from an HLS stream
How to reproduce the error:
-
run next HLS stream
gst-play-1.0 "https://content-dtci.uplynk.com/channel/3324f2467c414329b3b0cc5cd987b6be.m3u8?ad.tfcd=0&ad.is_lat=0&ad.rdid=VENUE_ID&ad.pp=datg-live-vdms&ad.npa=0&ad.scor=TIMESTAMP&ad.cust_params=vdm%3Dlive%26chan%3Dabcnews%26plt%3Dott%26refDomain%3Dhttps%3A%2F%2Fabcnews.go.com%26vps%3D640x480%26ppid%3DVENUE_ID%26beacTyp%3Dssai%26sp%3Dvdms%26ait%3Dssai%26var%3D16x9&ad.correlator=TIMESTAMP&ad.ppid=VENUE_ID&ad.cmsid=2494279&ad.networkID=21783347309&ad.vast3=1&ad=abcnews_live&ad.vid=CHANNEL_ID&ad.description_url=https%3A%2F%2Fabcnews.go.com&ad.idtype=rida&ad.adUnit=/abc-news/rockbot/ott&ad.us_privacy=0"
-
it may require to wait for a while, around 30 minutes. An error occurs after the advertisement finishes playing (but not always). Playback just stops and doesn't recover anymore.
Error analysis
connecting with gdb shows next deadlock:
Thread 18 is waiting for a "cleanup thread to finish", and blocks the pad 0x7f05c0017a00 (frame 27,28)
Thread 18 (Thread 0x7f063d7fa700 (LWP 2045419)):
#0 __pthread_clockjoin_ex (threadid=139662538569472, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
#1 0x00007f06489fd54b in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007f06489d9faf in g_thread_join () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007f0647788a4a in gst_decode_chain_start_free_hidden_groups_thread (chain=0x7f0640005690) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3743
#4 0x00007f064778a7bd in drain_and_switch_chains (chain=0x7f0640005690, drainpad=0x0, last_group=0x7f063d7f8420, drained=0x7f063d7f8428, switched=0x7f063d7f8424) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:4223
#5 0x00007f064778c9b5 in gst_decode_bin_expose (dbin=0x7f0638040090) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:4730
--Type <RET> for more, q to quit, c to continue without paging--
#6 0x00007f064778aceb in gst_decode_pad_handle_eos (pad=0x7f0638040350) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:4298
#7 0x00007f064778dc0a in source_pad_event_probe (pad=0x7f0638040350, info=0x7f063d7f8770, user_data=0x7f0638040350) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:5034
#8 0x00007f0648c4b3f8 in probe_hook_marshal (hook=0x7f0600025ac0, data=0x7f063d7f8640) at ../subprojects/gstreamer/gst/gstpad.c:3664
#9 0x00007f064899f996 in g_hook_list_marshal () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f0648c4bb23 in do_probe_callbacks (pad=0x7f0638040350, info=0x7f063d7f8770, defaultval=GST_FLOW_OK) at ../subprojects/gstreamer/gst/gstpad.c:3848
#11 0x00007f0648c5252a in gst_pad_push_event_unchecked (pad=0x7f0638040350, event=0x7f05a40fd9c0, type=GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM) at ../subprojects/gstreamer/gst/gstpad.c:5509
#12 0x00007f0648c4c545 in push_sticky (pad=0x7f0638040350, ev=0x7f063d7f8860, user_data=0x7f063d7f88b0) at ../subprojects/gstreamer/gst/gstpad.c:4047
#13 0x00007f0648c41a78 in events_foreach (pad=0x7f0638040350, func=0x7f0648c4c42c <push_sticky>, user_data=0x7f063d7f88b0) at ../subprojects/gstreamer/gst/gstpad.c:608
#14 0x00007f0648c4c90e in check_sticky (pad=0x7f0638040350, event=0x7f05a40fd9c0) at ../subprojects/gstreamer/gst/gstpad.c:4106
#15 0x00007f0648c52f79 in gst_pad_push_event (pad=0x7f0638040350, event=0x7f05a40fd9c0) at ../subprojects/gstreamer/gst/gstpad.c:5675
#16 0x00007f0648c49cc1 in event_forward_func (pad=0x7f0638040350, data=0x7f063d7f8a70) at ../subprojects/gstreamer/gst/gstpad.c:3125
#17 0x00007f0648c49aab in gst_pad_forward (pad=0x7f060c010360, forward=0x7f0648c49b96 <event_forward_func>, user_data=0x7f063d7f8a70) at ../subprojects/gstreamer/gst/gstpad.c:3079
#18 0x00007f0648c49e80 in gst_pad_event_default (pad=0x7f060c010360, parent=0x7f0638040350, event=0x7f05a40fd9c0) at ../subprojects/gstreamer/gst/gstpad.c:3176
#19 0x00007f0648c53e91 in gst_pad_send_event_unchecked (pad=0x7f060c010360, event=0x7f05a40fd9c0, type=GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM) at ../subprojects/gstreamer/gst/gstpad.c:5900
#20 0x00007f0648c5270e in gst_pad_push_event_unchecked (pad=0x7f05c0017310, event=0x7f05a40fd9c0, type=GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM) at ../subprojects/gstreamer/gst/gstpad.c:5544
#21 0x00007f0648c4c545 in push_sticky (pad=0x7f05c0017310, ev=0x7f063d7f8ca0, user_data=0x7f063d7f8cf0) at ../subprojects/gstreamer/gst/gstpad.c:4047
#22 0x00007f0648c41a78 in events_foreach (pad=0x7f05c0017310, func=0x7f0648c4c42c <push_sticky>, user_data=0x7f063d7f8cf0) at ../subprojects/gstreamer/gst/gstpad.c:608
#23 0x00007f0648c4c90e in check_sticky (pad=0x7f05c0017310, event=0x7f05a40fd9c0) at ../subprojects/gstreamer/gst/gstpad.c:4106
#24 0x00007f0648c52f79 in gst_pad_push_event (pad=0x7f05c0017310, event=0x7f05a40fd9c0) at ../subprojects/gstreamer/gst/gstpad.c:5675
#25 0x00007f0648d70421 in gst_video_decoder_push_event (decoder=0x7f0620011a00, event=0x7f05a40fd9c0) at ../subprojects/gst-plugins-base/gst-libs/gst/video/gstvideodecoder.c:1119
#26 0x00007f0648d723e9 in gst_video_decoder_sink_event_default (decoder=0x7f0620011a00, event=0x7f05a40fd9c0) at ../subprojects/gst-plugins-base/gst-libs/gst/video/gstvideodecoder.c:1655
#27 0x00007f0648d725b5 in gst_video_decoder_sink_event (pad=0x7f05c0017a00, parent=0x7f0620011a00, event=0x7f05a40fd9c0) at ../subprojects/gst-plugins-base/gst-libs/gst/video/gstvideodecoder.c:1691
#28 0x00007f0648c53e91 in gst_pad_send_event_unchecked (pad=0x7f05c0017a00, event=0x7f05a40fd9c0, type=GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM) at ../subprojects/gstreamer/gst/gstpad.c:5900
Meanwhile the cleanup thread is blocked trying to deactivate this pad 0x7f05c0017a00
Thread 34 (Thread 0x7f05b7fff700 (LWP 2045448)):
#0 __lll_lock_wait (futex=futex@entry=0x7f0620004140, private=0) at lowlevellock.c:52
#1 0x00007f0648507131 in __GI___pthread_mutex_lock (mutex=0x7f0620004140) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007f0648c42f0f in post_activate (pad=0x7f05c0017a00, new_mode=GST_PAD_MODE_NONE) at ../subprojects/gstreamer/gst/gstpad.c:1045
#3 0x00007f0648c4376e in activate_mode_internal (pad=0x7f05c0017a00, parent=0x7f0620011a00, mode=GST_PAD_MODE_PUSH, active=0) at ../subprojects/gstreamer/gst/gstpad.c:1223
#4 0x00007f0648c432df in gst_pad_set_active (pad=0x7f05c0017a00, active=0) at ../subprojects/gstreamer/gst/gstpad.c:1114
#5 0x00007f0648c1fdb0 in activate_pads (vpad=0x7f05b7ffe960, ret=0x7f05b7ffe9c0, active=0x7f05b7ffe9f4) at ../subprojects/gstreamer/gst/gstelement.c:3171
#6 0x00007f0648c3617b in gst_iterator_fold (it=0x7f0640353db0, func=0x7f0648c1fd6d <activate_pads>, ret=0x7f05b7ffe9c0, user_data=0x7f05b7ffe9f4) at ../subprojects/gstreamer/gst/gstiterator.c:617
#7 0x00007f0648c1fe57 in iterator_activate_fold_with_resync (iter=0x7f0640353db0, func=0x7f0648c1fd6d <activate_pads>, user_data=0x7f05b7ffe9f4) at ../subprojects/gstreamer/gst/gstelement.c:3195
#8 0x00007f0648c1ff9d in gst_element_pads_activate (element=0x7f0620011a00, active=0) at ../subprojects/gstreamer/gst/gstelement.c:3240
#9 0x00007f0648c2029c in gst_element_change_state_func (element=0x7f0620011a00, transition=GST_STATE_CHANGE_PAUSED_TO_READY) at ../subprojects/gstreamer/gst/gstelement.c:3305
#10 0x00007f0648d77c04 in gst_video_decoder_change_state (element=0x7f0620011a00, transition=GST_STATE_CHANGE_PAUSED_TO_READY) at ../subprojects/gst-plugins-base/gst-libs/gst/video/gstvideodecoder.c:2878
#11 0x00007f0648c1f8a8 in gst_element_change_state (element=0x7f0620011a00, transition=GST_STATE_CHANGE_PAUSED_TO_READY) at ../subprojects/gstreamer/gst/gstelement.c:3083
#12 0x00007f0648c1eddc in gst_element_continue_state (element=0x7f0620011a00, ret=GST_STATE_CHANGE_SUCCESS) at ../subprojects/gstreamer/gst/gstelement.c:2791
#13 0x00007f0648c1fbd2 in gst_element_change_state (element=0x7f0620011a00, transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at ../subprojects/gstreamer/gst/gstelement.c:3122
#14 0x00007f0648c1f632 in gst_element_set_state_func (element=0x7f0620011a00, state=GST_STATE_NULL) at ../subprojects/gstreamer/gst/gstelement.c:3037
#15 0x00007f0648c1f22d in gst_element_set_state (element=0x7f0620011a00, state=GST_STATE_NULL) at ../subprojects/gstreamer/gst/gstelement.c:2938
#16 0x00007f06477881ea in gst_decode_chain_free_internal (chain=0x7f05c036a1b0, hide=0) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3509
#17 0x00007f0647788738 in gst_decode_group_free_internal (group=0x7f0614002400, hide=0) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3627
#18 0x00007f0647787aff in gst_decode_chain_free_internal (chain=0x7f06081c46c0, hide=0) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3386
#19 0x00007f0647788738 in gst_decode_group_free_internal (group=0x7f0600422450, hide=0) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3627
#20 0x00007f064778894e in gst_decode_group_free (group=0x7f0600422450) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3681
#21 0x00007f06477889a6 in gst_decode_chain_free_hidden_groups (old_groups=0x7f063800f780 = {...}) at ../subprojects/gst-plugins-base/gst/playback/gstdecodebin2.c:3723
#22 0x00007f06489d9ad1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007f0648504609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#24 0x00007f06486dc133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
So,
- thread 18 blocks the pad 0x7f05c0017a00 (src pad of avdec_h264) and waits for the thread 34 to finish.
- thread 34 blocks trying to deactivate this pad 0x7f05c0017a00.
Hipothesis about the root cause
I was tracing the error and from what I've seen, it seems that handling EOS have walked an unexpected path:
- First gst_decode_pad_handle_eos() steps into drain_and_switch_chains() that switches the active decoding group to the next one and at this moment things seem to go well, the decoder we are blocking is now a part of an "old grop", that have been hidden and is going to be freed in another thread.
- Then it steps into gst_decode_bin_expose(), because it thinks that the new active group has "deadend" chains inside. The chains contain just a tsdemux and parsers inside. So, gst_decode_bin_expose() starts to hide and free the "new"(not anymore) active group too, and at this moment deadlocks: it has to wait for a cleanup thread that it blocks.
So, as I suppose there's unexpected code path:
- gst_decode_pad_handle_eos() --> drain_and_switch_chains() , it launches a cleanup thread for the group that was active (and blocks the thread therefore). This part is ok.
- gst_decode_pad_handle_eos() --> gst_decode_bin_expose(). Here the code decides to free the next group too. And this can't not to lead to the deadlock, because it has to wait for the cleanup thread first.