rtpjitterbuffer: Indefinite stuttering if the sender is delayed pushing first packet
Occasionally I've been running into a problem where the receiver will sometimes stutter indefinitely due to a bug with how timers are handled. Here's the setup:
Sender: Live pipeline is using a network clock, rtpbin's ntp-time-source is "clock-time", rtp-profile is "avpf" Receiver: Pipeline is using the same network clock (ntp-sync TRUE, buffer-mode "synced", latency 500ms, rtp-profile "avpf") and is using a fixed base time to match the sender's base time
The sender will start the pipeline, immediately capture the base time used and send it to the receiver (through a separate channel). This works fine, most of the time, but I noticed that if the sender is somehow delayed pulling down data to push into rtpbin, this causes a major problem on the receiving side.
Let's take this example:
- Sender and Receiver sync up to the common network clock provider and wait until clock is synced
- Sender sets pipeline to PLAYING, once complete base time is 12:00:00.000.
- Sender sends captured base time to receiver
- Receiver creates pipeline using the same (already synchronized) network clock source and fixes the base time to what Sender is using.
- Sender is being delayed, working on pulling down data to push out
- Sender has the first buffer at running time 00:00:05.000 and pushes it out
- Receiver gets the first buffer shortly later, dts (relative to base time) is 00:00:05.035
- Receiver calls
rtp_jitter_buffer_calculate_pts, which also calls
rtp_jitter_buffer_resync, which sets
calculate_skewto calculate a pts, let's say this is 00:00:05.045. The jitterbuffer then sets the
TIMER_TYPE_DEADLINEtimer with this pts. At this point, the deadline timer is set to timeout at 00:00:05.045.
- Receiver's jitterbuffer ts-offset at this point is still 00:00:00.00 because it hasn't been synced
- Before the deadline timer expires, a sync packet is processed. rtpbin calls
gst_rtp_bin_associate, and calculates the running time. As part of the formula, it adds in
priv->jbuf->base_time(plus skew), which means this value ends up in the ts-offset that it subsequently sets on the jitterbuffer. In this example it adds in 00:00:05.45. As soon as this happens, the timer thread's timer timeout values get an additional 5 seconds (added by
apply_offsetsadded onto them. This means the DEADLINE timer will now expires 5 seconds later, way past the latency. The end result is that everything will be pushed out way too late, and it will never recover.
For most cases,
priv->jbuf->base_time is very small and it goes unnoticed, but by simply adding a
sleep into the sender's thread prior to pusing the first buffer in, I can reliably get it into this bad state.
So, while I think I know what the problem is, I'm not sure how to properly fix this. I believe the problem is that the
gst_rtp_bin_associate sets includes
priv->jbuf->base_time, which as soon as this happens more than doubles the timer's timeout value, because
apply_offset will add the ts-offset to the timestamp, which already includes
priv->jbuf->base_time. It almost feels like there needs to be a ts-offset that doesn't include this base time, and a separate value ts-base-offset that would be set to
priv->jbuf->base_time. What complicates things is that
rtp_jitter_buffer_resync may cause
priv->jbuf->base_time to change, but it shouldn't affect timers already set up.