Add ringMonitoring feature to report renderer crashes (!1066) · Merge requests · virgl / virglrenderer

Ryan Neph requested to merge ryanneph/virglrenderer:handle-crash into master Mar 20, 2023

When a ring is created, the driver may require that the renderer reports it's status at least as often as the driver's configured uint32_t maxReportingPeriodMicroseconds.

The renderer is permitted to report more often, but must not report less often and miss the reporting window. This allows renderer implementations for multiple rings to be lightweight, since the renderer can have a single reporting thread that just wakes at the fastest required rate configured by the driver and sets all monitored ring status bits at once. If it becomes more efficient to monitor each ring with separate threads (i.e. multiple rings with very different reporting rates), the protocol can still support this implementation detail without affecting the driver.

The driver then checks the shared status field for the new VK_RING_STATUS_ALIVE_BIT_MESA. If set, the renderer is still alive and reported the status within the previous reporting window; all is well and the driver unsets the flag in preparation for the next reporting window. If unset, the renderer must have crashed and the driver abort()s the guest app. Currently, the driver checks the status during vn_relax() iterations that trigger "warn" messages and existing FATAL status checks. Then we hardcode the reporting period to be shorter than the first such "warn" iteration in any vn_relax(), with an added 1/4 second margin to avoid false positives. This can be made more flexible in the future, as needed.

Related Changes:

venus-protocol: https://gitlab.freedesktop.org/olv/venus-protocol/-/merge_requests/65
mesa: mesa/mesa!22036 (merged)

Edited Mar 20, 2023 by Ryan Neph

Admin message

Add ringMonitoring feature to report renderer crashes

Merge request reports