Discontinuity in the interpolated delay after corking, flushing and uncorking.
Submitted by Niklas Haas
Assigned to pul..@..op.org
Created attachment 126489 regular playback
After a long session of debugging in #pulseaudio, I got no further than this result, so I'm submitting it here for reference and to get more opinions:
Basically, I'm running into the issue where the reported delay (as measured by pa_stream_get_latency before and after writing data) suffers from a relatively big discontinuity (about 50-100ms) when seeking in mpv. Seeking in mpv is implemented by resetting the audio device, which in mpv's terms means:
- corking the audio stream (pa_stream_cork true)
- flushing the audio device (pa_stream_flush)
- uncorking the audio stream (pa_stream_cork false)
- continuing playback
I have attached a number of plots of mpv's internal timing data to help illustrate the problem. The first attachment (“regular playback”) establishes a baseline reading (10 seconds of uninterrupted playback). Some notes:
The green line (ao-delay) is the reported latency as measured directly by pa_stream_get_latency.
The blue line (ao-dev) is the difference between the latency and where mpv thinks it should be (assuming the audio device plays at a perfectly even rate). Ideally, this line should be exactly 0, and more importantly, this line should be as stable as possible (no jumps, no jittering and no spikes), because it is compared against the video stream's timing to detect A/V desynchronization.
While they're not as interesting for this test, the spikes at the top indicate events (rather than values). The curve going up represents an event starting, and the curve going down indicates an event stopping. It's somewhat harder to read the legend, so I'll repeat it here: green = ao-fill (inside “pa_stream_write”), blue = audio (“decoding more audio”), black = audio wait (“audio thread sleeping”), yellow = sleep (“playback thread sleeping”)
The latency values are measured with PA_STREAM_INTERPOLATE_TIMING | PA_STREAM_AUTO_TIMING_UPDATE | PA_STREAM_NOT_MONOTONIC, and the target latency (tlength) is 2000ms. Decreasing the tlength to a lower value decreases the magnitude of the discontinuity.
The second attachment (“seeking”) indicates what happens when triggering a seek in mpv. I have triggered two seeks, one at about 3.3s and one at about 6.6s. We can observe:
The reported latency (ao-delay) suffers from an upwards discontinuity of about 50ms - going up from the baseline latency of about 1050ms to a value near 1100ms, and then gradually decreasing back down to 1000ms over the course of the next 1-2 seconds.
Similarly, the “apparent” position of mpv suffers from a downwards discontinuity from 0ms to about -35ms, which increases to +20ms over the course of the next 1-2 seconds.
The third attachment (“seeking zoom”) is a detail view of the discontinuity as it happens, in case it helps.
In case you're wondering why these discontinues affect playback in mpv, I've included a fourth attachment demonstrating the same effects during regular playback. I've included two graphs - one with a smaller tlength (~200ms iirc) and one with a larger tlength (2000ms). They're a bit more confusing because they also include video stats, but basically the effect that happens is as follows:
- The user triggers a seek
- The apparent audio position (ao-dev, green line at the bottom) shoots downwards (as observed earlier)
- The measured A-V difference (avdiff, black line) spikes downwards as a result (by about 100ms in the tlength=2000ms case)
- This exceeds mpv's thresholds for acceptable audio delay and triggers the video output to drop several frames to resynchronize audio and video. (Blue square at the top), thus restoring avdiff to a value around 0.
- The apparent audio position rapidly approaches its “true” value again over the course of the next 1-2 seconds (again, as observed earlier)
- This causes the measured av difference (black line) to very rapidly grow upwards again within these 1-2 seconds, triggering several frame drops a long the way (every time it exceeds the acceptable threshold) - these are the clearly visible spikes and corresponding blue frames in the few seconds after the seek.
The only known work-around in mpv is to decrease the pulse buffer size (tlength) to a smaller value, which makes the issue less severe (as seen on the left) but doesn't fully solve it.
It would be great if this could get fixed upstream in PulseAudio so we don't have to hack around it in mpv.