Make implicit sync work with hk and virtgpu
As a pure explicit sync kernel driver, drm/asahi needs userspace support for implicit sync. The existing code for bare metal does not work for VMs, because in virtgpu the concept of a sync object within the guest is completely detached from a sync object in the host, so sync object APIs in the guest do nothing to manipulate the native fences and dma-bufs.
!288 (closed) and virglrenderer!1 (merged) fix this for the GL driver by essentially making the virtgpu API support implicit sync with an explicit BO list. However, this is less than ideal for Vulkan. We have several options in order of complexity:
- Do nothing, apparently things sort of work for fullscreen games even with totally broken sync, somehow?? The above fix fixes all the brokenness in UI apps that use GL, which is where this mostly shows up.
- Just keep track of all shared BOs in hk and pass them to the driver as WRITE BOs with every submission. This will fix hk->compositor sync (across the VM boundary or not) for GL compositors but wouldn't work for a Vulkan compositor within the VM or otherwise a Vulkan app consuming frames from outside the VM. It will probably also cause over-syncing if a compositor consumes a frame after the Vulkan client has sent subsequent submissions after the one that actually writes to it.
- Do the same but also mark them READ. This will give correct results but is likely to cause a significant perf penalty. Apparently Turnip does this?!
- Minimal implementation with correct tracking: For READ BOs, the Vulkan driver should be capable of tracking them as a "special" BO-based sync type (since the fence gets extracted prior to the submission), so we could forward them into the BO list without going through syncobjs. For WRITE BOs, we need the virglrenderer side to track out syncs in a per-queue timeline sync object or similar, and then we need to add an op to insert the fence for a given timeline point into a BO. This can then hook up into the WSI to do the right thing for WRITE BOs.
- Expose the "full" host syncobj API in our virt protocol, and replace all usage of syncobjs in hk/Mesa with this (therefore no longer using guest syncobjs at all). Pros: this will work better for cross-queue sync within an app since now all sync happens host-side and we entirely bypass the virtgpu sync machinery (which does silly things like... block in the ioctl path for all the in syncs??). Cons: probably quite a bit more invasive on the Mesa side / needs reimplementing a lot more code. Should this be made a generic virtgpu thing that other drivers can use? Also this completely breaks cross-process explicit sync within the guest (with standard mechanisms), which could otherwise sort of work with the virtgpu syncobjs.
- Actually figure out how to wrap host syncobjs in guest syncobjs in the kernel side. This would solve all the problems and potentially even allow host<->guest explicit sync (though that would still need work in sommelier/whatever to pass through the objects correctly).
For #5 above, we could even still support explicit sync between Vulkan apps within the guest using VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT
by doing something silly like defining the FDs to be memfds holding a guest fence ID in memory (as long as it's not writable like a syncobj, and only represents a unique fence, this shouldn't have any significant security implications even if it's not otherwise authenticated?). This won't work for WSI though, since those protocols are defined in terms of actual sync objects FDs, not opaque ones. Not sure if anything uses OPAQUE_FD_BIT?
TODOs:
-
Actively disable sync file explicit sync support in hk when running on virtgpu, since that is never going to work across the VM boundary without significant development (true cross-VM syncobj sharing which needs kernel support plus the sommelier bits) and it will probably perform worse than implicit sync within the VM as long as implicit sync works, given that virtgpu currently synchronously blocks on in fences.