systemclock: Use clock_nanosleep for higher accuracy
The various wait implementation have a latency ranging from 50 to 500+ microseconds. While this is not a major issue when dealing with a low number of waits per second (for ex: video), it does introduce a non-negligeable jitter for synchronization of higher packet rate systems.
The clock_nanosleep
syscall does offer a lower-latency waiting system but is
unfortunately blocking, so we don't want to use it in all scenarios nor for too
long.
This patch makes GstSystemClock use clock_nanosleep (if available) as such:
-
Any wait below 500us uses it
-
Any wait below 2ms will first use the regular waiting system and then clock_nanosleep
Impact
The following show the impact (simulated using fakesrc
with datarate and fakesink sync=true
).
- Packets per second (pps) and average time interval is used to show the rate at which we sync on the clock. In this example it's to simulate how much earlier we call for a clock wait.
- The delay is how much later the clock wait returned (after the target synchronization time)
- The final columns shows how much the 99% delay represents against the total wait duration. A lower value provides a much more stable rate of synchronization (your inter-buffer interval remains constant). Also included is the comparision against the pre-patch delay (i.e. the first line which doesn't use
clock_nanosleep
at all).
Packets per second | average time interval us | delay us (50%) | delay us (90%) | delay us (99%) | 99% jitter | old 99% jitter |
---|---|---|---|---|---|---|
100 | 10000 | 462 | 537 | 666 | 6.6 % | |
300 | 3333 | 134 | 482 | 607 | 18.2 % | 20.0 % |
400 | 2500 | 133 | 140 | 520 | 20.8 % | 26.7 % |
500 | 2000 | 70 | 76 | 99 | 5.0 % | 33.3 % |
600 | 1666 | 69 | 74 | 92 | 5.5 % | 40.0 % |
1000 | 1000 | 66 | 70 | 75 | 7.5 % | 66.6 % |
2000 | 500 | 66 | 71 | 76 | 15.2 % | 133.2 % |
- The 100pps situation exhibits the legacy behaviour (never uses the nanosleep). As can been, we are looking at around 500us latency.
- The 300 and 400pps situation start to use it a bit. Which can be seen by the average delay being divided by three, but with still 99% of data being up to 600us late
- Finally Starting from 500ppa the new code is activated and median/max latency drastically drops.
Note : These are numbers based on my system, the most important part to take into account is the scale by which the media/max latency drops. As can be noticed, there is a minimum latency being reached. This is due to the default scheduler being used. See at the bottom of this document for an example of the impact of using realtime schedulers.
100 pps
300 pps
400 ps
500pps
600pps
1000pps
2000pps
Real use-case and realtime scheduler.
This is a test pipeline I am using with is encoding live audiotestsrc and videotestsrc, muxing it to mpegts and sending it over SRT. The queue thread just before srtsink is set to use SCHED_RR
, the other threads (audiotestsrc and videotesrc is-live=true
) are using the regular scheduler:
Thread | Packets per second | average time interval us | delay us (50%) | delay us (90%) | delay us (99%) | 99% jitter | old 99% jitter |
---|---|---|---|---|---|---|---|
0x8d4760 (audiotestsrc ) |
47 | 21300 | 73 | 85 | 93 | 0.4 % | 3.1 % |
0x8d4860 (videotestsrc ) |
30 | 33333 | 72 | 84 | 95 | 0.3 % | 2.0 % |
0x8b58a0 (SCHED_RR srtsink ) |
950 | 1052.8 | 4.8 | 6.7 | 10.6 | 1 % | 63.3 % |
- Both the audiotestsrc and videotestsrc are producing data just close to the scheduled time, so benefit from the improvements (an order of magnitude improvement).
- Usage of the realtime scheduler allows increasing the accuracy even further and allowing sub-1% jitter in synchronization at high rates (2 order of magnitude improvement).