High accuracy GPU/CPU timestamp correlation
Background :
GPU/CPU timestamps correlations are used in 2 places in Mesa :
- VK_EXT_calibrated_timestamps :
It's an extension helping application calibrate their GPU command submission to match vblank deadlines.
- Perfetto :
A framework to trace things happening CPU/GPU : https://perfetto.dev/ A bunch of Mesa drivers have support for this. On Intel HW we can show various commands as well as OA metrics :
Current limitations :
To build the correlation between the GPU/CPU timestamp, we periodically read the timestamp register of the RCS. We have interfaces for this in i915 (REG_READ ioctl) & Xe (XE_MMIO ioctl). The sequence goes like this :
clock_gettime(&cpu_t1);
gpu_ts = driver_ioctl();
clock_gettime(&cpu_t2);
Now we have to approximate the CPU timestamp to the GPU one. We usually consider it's right in between the 2 CPU timestamps.
The problem with this approach is that we're going through an ioctl and the kernel can insert unrelated work in that boundary and so the correlation is somewhat inaccurate.
Last measurements we did a few years ago were around 20us of delta between cpu_t1 & cpu_t2 on gen9.
Potential solution :
A while ago we tried to come up with a new uAPI for i915 to have the correlation done within the kernel driver, so that we're immune to the kernel inserting work at ioctl boundary : https://patchwork.kernel.org/project/intel-gfx/patch/20210429003410.69754-2-umesh.nerlige.ramappa@intel.com/#24151071
I think the results we got were cpu timestamps deltas of 1us or less.
Unfortunately that's never gone upstream.
It would nice to have this in Xe.