rtpjitterbuffer: Indefinite stuttering if the sender is delayed pushing first packet

Occasionally I've been running into a problem where the receiver will sometimes stutter indefinitely due to a bug with how timers are handled. Here's the setup:

Sender: Live pipeline is using a network clock, rtpbin's ntp-time-source is "clock-time", rtp-profile is "avpf" Receiver: Pipeline is using the same network clock (ntp-sync TRUE, buffer-mode "synced", latency 500ms, rtp-profile "avpf") and is using a fixed base time to match the sender's base time

The sender will start the pipeline, immediately capture the base time used and send it to the receiver (through a separate channel). This works fine, most of the time, but I noticed that if the sender is somehow delayed pulling down data to push into rtpbin, this causes a major problem on the receiving side.

Let's take this example:

Sender and Receiver sync up to the common network clock provider and wait until clock is synced
Sender sets pipeline to PLAYING, once complete base time is 12:00:00.000.
Sender sends captured base time to receiver
Receiver creates pipeline using the same (already synchronized) network clock source and fixes the base time to what Sender is using.
Sender is being delayed, working on pulling down data to push out
Sender has the first buffer at running time 00:00:05.000 and pushes it out
Receiver gets the first buffer shortly later, dts (relative to base time) is 00:00:05.035
Receiver calls rtp_jitter_buffer_calculate_pts, which also calls rtp_jitter_buffer_resync, which sets priv->jbuf->base_time to 00:00:05.035. rtp_jitter_buffer_calculate_pts then calls calculate_skew to calculate a pts, let's say this is 00:00:05.045. The jitterbuffer then sets the TIMER_TYPE_DEADLINE timer with this pts. At this point, the deadline timer is set to timeout at 00:00:05.045.
Receiver's jitterbuffer ts-offset at this point is still 00:00:00.00 because it hasn't been synced
Before the deadline timer expires, a sync packet is processed. rtpbin calls gst_rtp_bin_associate, and calculates the running time. As part of the formula, it adds in priv->jbuf->base_time (plus skew), which means this value ends up in the ts-offset that it subsequently sets on the jitterbuffer. In this example it adds in 00:00:05.45. As soon as this happens, the timer thread's timer timeout values get an additional 5 seconds (added by apply_offsets added onto them. This means the DEADLINE timer will now expires 5 seconds later, way past the latency. The end result is that everything will be pushed out way too late, and it will never recover.

For most cases, priv->jbuf->base_time is very small and it goes unnoticed, but by simply adding a sleep into the sender's thread prior to pusing the first buffer in, I can reliably get it into this bad state.

So, while I think I know what the problem is, I'm not sure how to properly fix this. I believe the problem is that the ts-offset that gst_rtp_bin_associate sets includes priv->jbuf->base_time, which as soon as this happens more than doubles the timer's timeout value, because apply_offset will add the ts-offset to the timestamp, which already includes priv->jbuf->base_time. It almost feels like there needs to be a ts-offset that doesn't include this base time, and a separate value ts-base-offset that would be set to priv->jbuf->base_time. What complicates things is that rtp_jitter_buffer_resync may cause priv->jbuf->base_time to change, but it shouldn't affect timers already set up.

Any thoughts?

Edited May 09, 2019 by Thomas Bluemel