rtpjitterbuffer: Indefinite stuttering if the sender is delayed pushing first packet
Occasionally I've been running into a problem where the receiver will sometimes stutter indefinitely due to a bug with how timers are handled. Here's the setup:
Sender: Live pipeline is using a network clock, rtpbin's ntp-time-source is "clock-time", rtp-profile is "avpf" Receiver: Pipeline is using the same network clock (ntp-sync TRUE, buffer-mode "synced", latency 500ms, rtp-profile "avpf") and is using a fixed base time to match the sender's base time
The sender will start the pipeline, immediately capture the base time used and send it to the receiver (through a separate channel). This works fine, most of the time, but I noticed that if the sender is somehow delayed pulling down data to push into rtpbin, this causes a major problem on the receiving side.
Let's take this example:
- Sender and Receiver sync up to the common network clock provider and wait until clock is synced
- Sender sets pipeline to PLAYING, once complete base time is 12:00:00.000.
- Sender sends captured base time to receiver
- Receiver creates pipeline using the same (already synchronized) network clock source and fixes the base time to what Sender is using.
- Sender is being delayed, working on pulling down data to push out
- Sender has the first buffer at running time 00:00:05.000 and pushes it out
- Receiver gets the first buffer shortly later, dts (relative to base time) is 00:00:05.035
- Receiver calls
rtp_jitter_buffer_calculate_pts
, which also callsrtp_jitter_buffer_resync
, which setspriv->jbuf->base_time
to 00:00:05.035.rtp_jitter_buffer_calculate_pts
then callscalculate_skew
to calculate a pts, let's say this is 00:00:05.045. The jitterbuffer then sets theTIMER_TYPE_DEADLINE
timer with this pts. At this point, the deadline timer is set to timeout at 00:00:05.045. - Receiver's jitterbuffer ts-offset at this point is still 00:00:00.00 because it hasn't been synced
- Before the deadline timer expires, a sync packet is processed. rtpbin calls
gst_rtp_bin_associate
, and calculates the running time. As part of the formula, it adds inpriv->jbuf->base_time
(plus skew), which means this value ends up in the ts-offset that it subsequently sets on the jitterbuffer. In this example it adds in 00:00:05.45. As soon as this happens, the timer thread's timer timeout values get an additional 5 seconds (added byapply_offsets
added onto them. This means the DEADLINE timer will now expires 5 seconds later, way past the latency. The end result is that everything will be pushed out way too late, and it will never recover.
For most cases, priv->jbuf->base_time
is very small and it goes unnoticed, but by simply adding a sleep
into the sender's thread prior to pusing the first buffer in, I can reliably get it into this bad state.
So, while I think I know what the problem is, I'm not sure how to properly fix this. I believe the problem is that the ts-offset
that gst_rtp_bin_associate
sets includes priv->jbuf->base_time
, which as soon as this happens more than doubles the timer's timeout value, because apply_offset
will add the ts-offset to the timestamp, which already includes priv->jbuf->base_time
. It almost feels like there needs to be a ts-offset that doesn't include this base time, and a separate value ts-base-offset that would be set to priv->jbuf->base_time
. What complicates things is that rtp_jitter_buffer_resync
may cause priv->jbuf->base_time
to change, but it shouldn't affect timers already set up.
Any thoughts?