Skip to content

qtdemux: Fix race on pad reconnection

While working on gst-plugins-bad!907 (merged) I wondered why qtdemux did not expose the same bug, since both use a flowcombiner and qtdemux does not seem to reset it in response to RECONFIGURE events.

By adding some logging I realized the order in which the frames were chained by qtdemux happened to be perfect to avoid all pads reaching not-linked and reproduce the bug... but should one more frame be chained (from the other srcpad, since they are alternating) the bug would reproduce.

In practice, I was never able to reproduce that race in an unpatched qtdemux: even with rr record --chaos it was too unlikely. But that doesn't prove the race is not there and that it couldn't happen... so with more analysis I wrote a table with a scheduling that would ensure the race was reproduced. Then I made a patch to demonstrate it: by adding strategic waits using a step counter, a mutex and a condition variable, the threads run in an order that reproduces the bug consistently.

In this race there are three relevant threads:

  • The qtdemux streaming thread (qtdemux:sink). qtdemux is working in pull mode here.
  • The thread sending the select-streams event (let's consider it to be the main thread here for simplicity, main).
  • The streaming thread of the intially active multiqueue srcpad downstream (multiqueue:src_0).

The following is an order of actions that reproduces the issue:

                            decodebin3                           |
 +-----------------------------+     +-----------------------+   |
 |                     +-------+     +--------+  +-----------+      +---------------------+
 |                     | src_0 |---->| sink_0 |  | src_0 [T] |--+   |                     |
 +----------+          +-------+     +--------+  +-----------+  |   +------+              |
 | sink [T] | qtdemux          |     |       multiqueue      |  +-->| sink | audioconcat  |
 +----------+          +-------+     +--------+  +-----------+      +------+              |
 |                     | src_1 |---->| sink_1 |  | src_1 [T] |      |                     |
 |                     +-------+     +--------+  +-----------+      +---------------------+
 +-----------------------------+     +-----------------------+   |
                                                                 |
  1. qtdemux:sink completely chains a buffer through src_0, returning GST_FLOW_OK at multiqueue:sink_0.
  2. qtdemux:sink completely chains a buffer through src_1, returning GST_FLOW_NOT_LINKED at multiqueue:sink_1. Because of the flow combiner in qtdemux, qtdemux continues.
  3. multiqueue:src_0 picks the buffer and calls the chain function of it's peer to push it downstream.
  4. Before that chain ends, main sends a select-stream event upstream to switch the enabled audio track. In consequence, decodebin3 adds an IDLE probe to multiqueue:src_0.
  5. multiqueue:src_0 returns from the chain function, which immediately triggers the IDLE probe function, which unlinks downstream from src_0 and links it with src_1.
  6. qtdemux:sink completely chains a buffer through src_0... now returning GST_FLOW_NOT_LINKED. Both pads are not linked, the flow combiner state becomes GST_FLOW_NOT_LINKED too and an error is propagated.

Things to note:

multiqueue's srcpads return the latest chain return of the matching multiqueue sinkpad. This is necessary because otherwise the queue would be blocking, which would defeat its purpose.

There are things that could prevent this from happening, but don't:

  • When the pads are relinked, a RECONFIGURE event is sent upstream. multiqueue receives it and resets srcresult to GST_FLOW_OK, and also signals a condition variable to awake the srcpad streaming thread.This makes multiqueue sinkpads return GST_FLOW_OK... until the awaken srcpad sets srcresult to GST_FLOW_NOT_LINKED again pretty quickly. In every run this happened before qtdemux chained the next buffer.

  • qtdemux also receives the RECONFIGURE event, but doesn't react to it. If qtdemux reacted to it by resetting the flowcombiner like in the case of testsrcbin, the problem would be fixed because by the time flow is possible through the pads (when the relink is done and the pads are unlocked), one of them is guaranteed to return GST_FLOW_OK.

Repro:

Checkout the branch qtdemux-switch-track-race in https://gitlab.freedesktop.org/ntrrgc/gst-plugins-base and https://gitlab.freedesktop.org/ntrrgc/gst-plugins-good.

Then play an MP4 file with two audios in playbin3, like this one: double-audio-long.mp4

gst-play-1.0 --use-playbin3 double-audio-long.mp4

Fix:

This MR. Rebase the provided patch on top of the repro to see how the track switch works even in the bad scheduling case.

Edited by Alicia Boya García

Merge request reports