Edward Hervey requested to merge bilboed/gstreamer:clock_nanosleep into master Oct 30, 2020

The various wait implementation have a latency ranging from 50 to 500+ microseconds. While this is not a major issue when dealing with a low number of waits per second (for ex: video), it does introduce a non-negligeable jitter for synchronization of higher packet rate systems.

The clock_nanosleep syscall does offer a lower-latency waiting system but is unfortunately blocking, so we don't want to use it in all scenarios nor for too long.

This patch makes GstSystemClock use clock_nanosleep (if available) as such:

Any wait below 500us uses it
Any wait below 2ms will first use the regular waiting system and then clock_nanosleep

modified: gst/gstsystemclock.c

Impact

The following show the impact (simulated using fakesrc with datarate and fakesink sync=true).

Packets per second (pps) and average time interval is used to show the rate at which we sync on the clock. In this example it's to simulate how much earlier we call for a clock wait.
The delay is how much later the clock wait returned (after the target synchronization time)
The final columns shows how much the 99% delay represents against the total wait duration. A lower value provides a much more stable rate of synchronization (your inter-buffer interval remains constant). Also included is the comparision against the pre-patch delay (i.e. the first line which doesn't use clock_nanosleep at all).

Packets per second	average time interval us	delay us (50%)	delay us (90%)	delay us (99%)	99% jitter	old 99% jitter
100	10000	462	537	666	6.6 %
300	3333	134	482	607	18.2 %	20.0 %
400	2500	133	140	520	20.8 %	26.7 %
500	2000	70	76	99	5.0 %	33.3 %
600	1666	69	74	92	5.5 %	40.0 %
1000	1000	66	70	75	7.5 %	66.6 %
2000	500	66	71	76	15.2 %	133.2 %

The 100pps situation exhibits the legacy behaviour (never uses the nanosleep). As can been, we are looking at around 500us latency.
The 300 and 400pps situation start to use it a bit. Which can be seen by the average delay being divided by three, but with still 99% of data being up to 600us late
Finally Starting from 500ppa the new code is activated and median/max latency drastically drops.

Note : These are numbers based on my system, the most important part to take into account is the scale by which the media/max latency drops. As can be noticed, there is a minimum latency being reached. This is due to the default scheduler being used. See at the bottom of this document for an example of the impact of using realtime schedulers.

100 pps

300 pps

400 ps

500pps

600pps

1000pps

2000pps

Real use-case and realtime scheduler.

This is a test pipeline I am using with is encoding live audiotestsrc and videotestsrc, muxing it to mpegts and sending it over SRT. The queue thread just before srtsink is set to use SCHED_RR, the other threads (audiotestsrc and videotesrc is-live=true) are using the regular scheduler:

Thread	Packets per second	average time interval us	delay us (50%)	delay us (90%)	delay us (99%)	99% jitter	old 99% jitter
0x8d4760 (`audiotestsrc`)	47	21300	73	85	93	0.4 %	3.1 %
0x8d4860 (`videotestsrc`)	30	33333	72	84	95	0.3 %	2.0 %
0x8b58a0 (`SCHED_RR srtsink`)	950	1052.8	4.8	6.7	10.6	1 %	63.3 %

Both the audiotestsrc and videotestsrc are producing data just close to the scheduled time, so benefit from the improvements (an order of magnitude improvement).
Usage of the realtime scheduler allows increasing the accuracy even further and allowing sub-1% jitter in synchronization at high rates (2 order of magnitude improvement).

Edited Oct 31, 2020 by Edward Hervey

systemclock: Use clock_nanosleep for higher accuracy

modified: gst/gstsystemclock.c