Skip to content

lavapipe: Lift fence check into dedicated function

Omar Akkila requested to merge rakko/mesa:lvp-venus-timeline into main

UPDATE: This MR used to be titled Draft: lavapipe: Garbage collect signalled semaphores accompanied by the following description. Solution is now superseded by !15453 (merged) and this retains the initial commit whereby fence checks are lifted into a separate function. Description is left just in case one comes looking for context.


This is based on my experiments testing Venus with lavapipe used as the host driver.

When running tests from dEQP-VK.synchronization.*, many tests seemed to result in infinite waits in both Venus (guest) and lavapipe (host) most notably from timeline semaphore tests.

You can use dEQP-VK.synchronization.timeline_semaphore.device_host.write_image_tess_control_read_image_tess_eval.image_128x128_r16g16b16a16_uint as an example. This test performs a few iterations of GPU write op -> GPU read op -> CPU wait and signal next iteration with each step in the chain waiting on the previous one and signalling the next.

Investigating the issue, I was made aware of Venus' vkWaitSemaphores implementation (vn_WaitSemaphores) which instead of calling into the host driver, simply polls vkGetSemaphoreCounterValue until the semaphore has been signalled. My guess is this is to avoid waits in the host and remain within the guest. Here, the CPU waits until the GPU read operation is finished and its semaphore signaled.

However, when investigating lavapipe, the semaphore was indeed signaled but since the vkWaitSemaphores invocation is skipped, it does not make it to wait_semaphores which checks for whether the semaphore was signalled.

These changes are an attempt to "fix" this by having lavapipe check if the semaphore has been signaled and performs the wait when pruning semaphore links. Although this seems more like an issue with Venus, I thought this would be a good compromise.

Something to note is that I took inspiration from the vk_sync_timeline implementation from the common vk_sync framework because this issue does not manifest when using a driver, like ANV, which uses that same framework. When I looked into it, I noticed that it did something similar where signaled semaphores were garbage collected.

I've been testing this against dEQP-VK.synchronization.* and while I do get tests to pass with Venus, I am still running into issues with hangs when using lavapipe alone from time to time, so I figured I could use some guidance here.

I'd also like to open the discussion to whether this should really be needed to dealt with in lavapipe or Venus. Heck, might be worth considering a revisit of #5640 (closed).

Another note is that Venus does not actually currently support a lavapipe host driver. This is part of an effort to introduce such support.

\cc @olv @zmike

Edited by Omar Akkila

Merge request reports