WIP: vkr: handle render worker crash gracefully
Let me reuse the added comment as the MR description:
/* The current implementation for renderer ping is Option A:
* - When this call goes through, returning VK_SUCCESS means the renderer is alive.
* - If the renderer is gone, the driver side will abort with a trivial timeout.
* This is simple and works robustly after we have relaxed the lock between renderer
* and ring cmd submissions.
*
* The other options considered are as below:
*
* - Base work for both options: Add a new client op into render protocol for renderer
* worker status check. Server side has the worker records and knows exactly which
* render workers are alive.
*
* - Option B: Add a new synchronous virtgpu ioctl for renderer ping. Proxy context
* will ask renderer server for its corresponding render worker status and return.
*
* - Option C: Reuse the vkGetRendererStatus100000MESA query. Proxy context has to:
* - hijack the reply shmem creation for this query or preserve a shmem in advance
* - hijack the renderer status query cmd, ping the server and write the reply
* - hijack the followed cpu fence submission for this and retire it properly
*/
Related MRs:
- protocol: https://gitlab.freedesktop.org/olv/venus-protocol/-/merge_requests/63
- driver: TBD
Edited by Yiwei Zhang