venus: parallel command recording
Currently, venus is single-threaded (ring thread) on the renderer side (except virtio-gpu realted bits sitting on the main thread). This has been the bottleneck for triple A titles that are already doing parallel cmd recording either natively or via layering (e.g. proton, zink, etc).
All venus side blockers have already been resolved before protocol 1.0 release (virgl/virglrenderer!1045 (merged) and its deps, !21716 (merged), etc). Now it's time to bridge the gap here. Below is a brief:
- add a thread pool on the renderer side used to dispatch
vkExecuteCommandStreamsMESA
- key: allow to advance ring head as soon as cs stream gets dispatched to the thread pool
- keep queue commands in ring thread while gating the dependency between ring thread and cmd stream decoding/re-recording thread with inter-ring ring-wait
- driver side to express additional inter-cs-stream deps via
VkCommandStreamDependencyMESA
for better parallism (we could further break the cmd stream on our own with nice driver side dep analysis)