vulkan/wsi: Signal fences/semaphores using sync_files exported from dma-bufs

Faith Ekstrand requested to merge gfxstrand/mesa:wip/anv-dma-buf-sync-file into main

Translating Vulkan's explicit synchronization model for WSI to and from the implicit synchronization model used by Linux window systems is painful and I'm not convinced any of the drivers currently get it 100% right. There are two halves to this problem:

  1. Client -> compositor synchronization: For this, vkQueuePresentKHR takes a semaphore which is supposed to block the presentation engine (window system).

  2. Compositor -> client synchronization: For this, vkAcquireNextImageKHR takes an optional fence and/or semaphore which it will signal when "the presentation engine reads have completed".

The first half isn't too hard. RADV and ANV take slightly different approaches but I believe both are correct. In ANV, we pass the WSI image BO specially to the dummy vkQueueSubmit() call and the driver knows to set EXEC_OBJECT_WRITE on it at that time. In RADV, they use image ownership to determine when to tell the AMD kernel driver to set implicit fences. In either case, the implicit fencing works more-or-less the same as it does in GL and the compositor waits on the Vulkan client just fine.

The second half is tricky. Again, RADV and ANV take slightly different approaches and I'm pretty sure we both get it subtly wrong. The RADV approach is to use ownership again and just let implicit sync do it's job more-or-less. The problem with this approach is that the semaphores and fences that come out of vkAcquireNextImageKHR aren't real. They're just dummy objects that return immediately when you wait on them. This means that the fence, in particular, isn't accurate. It may return VK_SUCCESS before the window system is actually done. This is good enough for 99.9% of cases, however, because you can't touch WSI memory from the CPU and implicit synchronization papers over all the GPU issues. This approach also leads to potential over-synchronization issues because, if the client acquires multiple WSI images, any rendering will wait on the compositor to be done with every WSI image that's client-owned and not just the one specified by the VkSemaphore.

The ANV approach is to treat the fences and semaphores as "real" but with the WSI BO stuffed inside them. This means everything is real and fence waits won't complete early. The downside is that we fence waits may now take into account work submitted by the client on the WSI BO because GEM_WAIT just waits for the bo to be totally idle.

This MR adds a new approach which, IMO, finally solves this problem for real. It depends on a new kernel dma-buf IOCTL which returns a snapshot of the pile of fences on the dma-buf as a sync_file. This lets us wait on the implicit synchronization from the compositor properly without any over-synchronization and without depending on tracking which BOs are current client-owned. It should 100% remove ANV's over-synchronization problems with returned fences without resorting to fake fences and relying on implicit synchronization internally.

Latest Kernel series:

cc: @krh @bnieuwenhuizen @airlied @danvet

Edited by Faith Ekstrand

Merge request reports