multiqueue: SegFault during flushing
Recently started testing 1.20.2 and have noticed occasional segfaults when stopping the pipeline.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xb6eb7420 in g_mutex_lock (mutex=mutex@entry=0xc06588) at ../glib-2.72.1/glib/gthread-posix.c:1516
1516 if G_UNLIKELY (g_atomic_int_add (&mutex->i[0], 1) != 0)
#0 0xb6eb7420 in g_mutex_lock (mutex=mutex@entry=0xc06588) at ../glib-2.72.1/glib/gthread-posix.c:1516
#1 0xadaa62d4 in gst_multi_queue_loop (pad=<optimized out>) at ../gstreamer-1.20.2/plugins/elements/gstmultiqueue.c:2370
#2 0xb6cfc40c in gst_task_func (task=0xab207828) at ../gstreamer-1.20.2/gst/gsttask.c:384
#3 0xb6e8924c in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib-2.72.1/glib/gthreadpool.c:354
#4 0xb6e88638 in g_thread_proxy (data=0xb1e96cc8) at ../glib-2.72.1/glib/gthread.c:827
#5 0xb6112874 in start_thread (arg=0xa9333230) at pthread_create.c:435
#6 0xb6193a7c in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:75 from /lib/libc.so.6
It looks like a mis-compile with GCC 11.3.0 (armv7) as the upper frame has the correct info.
(gdb) frame 1
#1 0xadaa62d4 in gst_multi_queue_loop (pad=<optimized out>) at ../gstreamer-1.20.2/plugins/elements/gstmultiqueue.c:2370
2370 GST_MULTI_QUEUE_MUTEX_LOCK (mq);
(gdb) print *mq
$1 = {element = {object = {object = {g_type_instance = {g_class = 0x2200a28}, ref_count = 7, qdata = 0x9ee01fc8}, lock = {p = 0x0, i = {0, 0}}, name = 0xb1eb9fe8 "multiqueue186", parent = 0x22fb298,
flags = 0, control_bindings = 0x0, control_rate = 100000000, last_sync = 18446744073709551615, _gst_reserved = 0x0}, state_lock = {p = 0xb1e987c0, i = {0, 0}}, state_cond = {p = 0x0, i = {6, 0}},
state_cookie = 3, target_state = GST_STATE_PAUSED, current_state = GST_STATE_PAUSED, next_state = GST_STATE_VOID_PENDING, pending_state = GST_STATE_VOID_PENDING, last_return = GST_STATE_CHANGE_SUCCESS,
bus = 0x1bffe48, clock = 0xb21a0d98, base_time = 78487429954947, start_time = 82301248839, numpads = 4, pads = 0xb21c6980, numsrcpads = 2, srcpads = 0xb21c6310, numsinkpads = 2, sinkpads = 0xb21c6990,
pads_cookie = 4, contexts = 0x0, _gst_reserved = {0x0, 0x0, 0x0}}, sync_by_running_time = 0, use_interleave = 0, min_interleave_time = 250000000, nbqueues = 2, queues = 0xb21c6340, queues_cookie = 2,
max_size = {visible = 5, bytes = 83886080, time = 0}, extra_size = {visible = 5, bytes = 10485760, time = 3000000000}, use_buffering = 0, low_watermark = 10000, high_watermark = 990000, buffering = 0,
buffering_percent = 0, counter = 5546, highid = 5535, high_time = 82457366666, qlock = {p = 0x0, i = {0, 0}}, numwaiting = 0, buffering_percent_changed = 0, buffering_post_lock = {p = 0x0, i = {0, 0}},
interleave = 0, last_interleave_update = 0, unlinked_cache_time = 250000000}
(gdb) print &mq->qlock
$2 = (GMutex *) 0xacd307c0
Looking at the disassembly I think it gets confused by:
if (!mq || !srcpad)
goto out_flushing;
I can't see how that'll work since the one of the first things out_flushing
does is take the queue lock. Now, that code is never actually triggered since I see the
0:12:40.590107207 25897 0xb3029eb0 LOG multiqueue gstmultiqueue.c:2339:gst_multi_queue_loop:<multiqueue25> sq:1 AFTER PUSHING sq->srcresult: flushing (is_eos:0)
as the last line in the log, but applying the following diff seems to unconfuse the compiler and my test runs seem stable now.
--- gstmultiqueue.c.orig 2022-06-04 05:31:26.894684483 -0400
+++ gstmultiqueue.c 2022-06-04 05:32:20.221228339 -0400
@@ -2101,7 +2101,7 @@
srcpad = g_weak_ref_get (&sq->srcpad);
if (!mq || !srcpad)
- goto out_flushing;
+ goto done;
next:
GST_DEBUG_OBJECT (mq, "SingleQueue %d : trying to pop an object", sq->id);
Not sure if that's the correct fix or should there be some other error handling there.