Today, whenever guest kernel processes virtio-gpu job that has in-fence, this in-fence needs to be awaited on guest side by the guest kernel driver. This creates extra delay for the job submission pipeline that we want to avoid.
There two possible fence types for a virtio-gpu job:
- Guest waitable
- Host waitable
The guest-waitable fence can be waited only on the guest side. That could be a guest SW fence or a dma-fence produced by a device driver other than virtio-gpu.
The host-waitable fence is created by virtio-gpu driver. It represents host GPU fence. Such a fence can be awaited on both guest and host sides. Every virtio-gpu 3d job produces a host-waitable out-fence and further jobs may use it as in-fence for job-ordering synchronization purposes.
OpenGL and Vulkan APIs provide interoperability support, allowing applications to use both GL and VK contexts together. Application draws something using VK and then processes result using GL. This is the case where 3d jobs ordering synchronization comes in play. Virtio-gpu guest kernel driver has to wait for VK job to complete before proceeding to submitting the GL job.
We want guest to submit both VK and GL jobs without stalling, letting host driver to handle the waits. GPUs have own scheduler and sync primitives, allowing software to offload waits to hardware.
Add a new virtio-gpu command that will pass guest in-fence IDs (that are waitable on host) to the host side. Virglrender then will take the in-fence ID and resolve it into a context-specific fence descriptor, allowing contexts to wait for the fence efficiently.
This MR adds new public virglrender API function
virgl_renderer_context_push_fence_wait(), allowing VMMs to implement the new
virtio-gpu fence-passing command. Fence passing is enabled for the DRM/MSM native context in this MR as well.
Enabling fence-passing support improves performance for the VK/GL interoperability. Guest applications will see up to 2-3x more FPS.
Questions and TODOs
Current implementation enables fence passing for native contexts using sync file, assuming sync file is a universal fence type. For vrend and venus contexts, sync file isn't suitable because GL and VK support only opaque fence FD sharing. In order to support vrend/venus contexts, guest will need to know whether fence produced by one context type is supported by other context type, but this should be easy to support. Guest Mesa will also need to support memobj for virgl and venus, it's mandatory by the VK/GL interop API.
Only sync file will be supported
Can we have common fence signalling thread in virglrender? Do we need it? Rob Clark suggested that we possibly could have a common context-agnostic fence signalling/tracking thread in virglrender. For VK we get opaque GPU syncobj and CPU out-fences, not a sync file like in a case of native contexts. Virglrender can track only "CPU out-fence", while syncobj is what needs to be shared with other contexts. For GL we get sync file using EGL extension, but EGL is optional for vrend. Vrend and venus will have to continue doing own fence polling that is specific to GL/VK. The idea is to have context-specific callbacks that will allow to unify the thread code by abstracting context-specific differences. This might involve a lot of code movement that is unrelated to the fence passing and the benefits are unclear right now. We will need a custom callback to enable fence passing for vrend and venus, maybe the benefits will become clearer once we'll get closer to supporting vrend/venus. For now I decided that will be better to keep code simple for reviewing purposes, it may take some time until vrend/venus will be supported.
Both virgl and venus will continue to use own threads for non-exportable fences. Common fence-signalling thread isn't needed.
Guard code for WIN32 builds. Small thing, but needs to be done.
Guards were added
CrOS will go further and pass GPU out-fence to a display context. This may require additional changes, like adding API for exporting virgl_fence.
There is now dedicated function for exporting of the last signalled fence
The development and testing is done using WIP virtio-intel native context. Here are the links to the nctx branches that I'm updating periodically, they always contain all the new WIP features. You'll find bleeding edge version of the fence passing patches there, including integration for qemu/crosvm/kernel.
Comments and questions are welcome!