Skip to content

systemclock: Use clock_nanosleep for higher accuracy

Edward Hervey requested to merge bilboed/gstreamer:clock_nanosleep into master

The various wait implementation have a latency ranging from 50 to 500+ microseconds. While this is not a major issue when dealing with a low number of waits per second (for ex: video), it does introduce a non-negligeable jitter for synchronization of higher packet rate systems.

The clock_nanosleep syscall does offer a lower-latency waiting system but is unfortunately blocking, so we don't want to use it in all scenarios nor for too long.

This patch makes GstSystemClock use clock_nanosleep (if available) as such:

  • Any wait below 500us uses it

  • Any wait below 2ms will first use the regular waiting system and then clock_nanosleep

    modified: gst/gstsystemclock.c

Impact

The following show the impact (simulated using fakesrc with datarate and fakesink sync=true).

  • Packets per second (pps) and average time interval is used to show the rate at which we sync on the clock. In this example it's to simulate how much earlier we call for a clock wait.
  • The delay is how much later the clock wait returned (after the target synchronization time)
  • The final columns shows how much the 99% delay represents against the total wait duration. A lower value provides a much more stable rate of synchronization (your inter-buffer interval remains constant). Also included is the comparision against the pre-patch delay (i.e. the first line which doesn't use clock_nanosleep at all).
Packets per second average time interval us delay us (50%) delay us (90%) delay us (99%) 99% jitter old 99% jitter
100 10000 462 537 666 6.6 %
300 3333 134 482 607 18.2 % 20.0 %
400 2500 133 140 520 20.8 % 26.7 %
500 2000 70 76 99 5.0 % 33.3 %
600 1666 69 74 92 5.5 % 40.0 %
1000 1000 66 70 75 7.5 % 66.6 %
2000 500 66 71 76 15.2 % 133.2 %
  • The 100pps situation exhibits the legacy behaviour (never uses the nanosleep). As can been, we are looking at around 500us latency.
  • The 300 and 400pps situation start to use it a bit. Which can be seen by the average delay being divided by three, but with still 99% of data being up to 600us late
  • Finally Starting from 500ppa the new code is activated and median/max latency drastically drops.

Note : These are numbers based on my system, the most important part to take into account is the scale by which the media/max latency drops. As can be noticed, there is a minimum latency being reached. This is due to the default scheduler being used. See at the bottom of this document for an example of the impact of using realtime schedulers.

100 pps

Clock_offset_post-render-100pps

300 pps

Clock_offset_post-render-300pps

400 ps

Clock_offset_post-render-400pps

500pps

Clock_offset_post-render-500pps

600pps

Clock_offset_post-render-600pps

1000pps

Clock_offset_post-render-1000pps

2000pps

Clock_offset_post-render-2000pps

Real use-case and realtime scheduler.

This is a test pipeline I am using with is encoding live audiotestsrc and videotestsrc, muxing it to mpegts and sending it over SRT. The queue thread just before srtsink is set to use SCHED_RR, the other threads (audiotestsrc and videotesrc is-live=true) are using the regular scheduler:

Thread Packets per second average time interval us delay us (50%) delay us (90%) delay us (99%) 99% jitter old 99% jitter
0x8d4760 (audiotestsrc) 47 21300 73 85 93 0.4 % 3.1 %
0x8d4860 (videotestsrc) 30 33333 72 84 95 0.3 % 2.0 %
0x8b58a0 (SCHED_RR srtsink) 950 1052.8 4.8 6.7 10.6 1 % 63.3 %
  • Both the audiotestsrc and videotestsrc are producing data just close to the scheduled time, so benefit from the improvements (an order of magnitude improvement).
  • Usage of the realtime scheduler allows increasing the accuracy even further and allowing sub-1% jitter in synchronization at high rates (2 order of magnitude improvement).

Clock_offset_post-render-realtime

Edited by Edward Hervey

Merge request reports