Use new ringMonitoring feature to detect renderer crashes
When a Venus renderer advertises the new ringMonitoring
feature, the driver may configure periodic ring health monitoring that works robustly regardless of whether ring(s)/renderer command streams are currently blocked by an async-wait barrier in the on the renderer-side (see !21716 (merged)).
It works as follows:
- During ring creation, driver checks
ringMonitoring
feature. If supported it chainsVkRingMonitorInfoMESA
toVkRingCreateInfoMESA::pNext
with auint32_t maxReportingPeriodMicroseconds
.- the
maxReportingPeriodMicroseconds
is the longest the renderer is permitted to wait between successive ring "alive" reports. - the driver must wait at least as long as
maxReportingPeriodMicroseconds
before checking the most recent report. In practice we ensure this is at least met with an extra margin of 0.25s. - actual driver report check timing is dictated by the timing of
vn_relax()
's "warn_order" (ensuring this is >=maxReportingPeriodMicroseconds
with hardcoded params and a runtime assert).
- the
- Every driver-side ring wait reaching a "warn_order" iteration, will check the ring's "alive" status, in addition to the existing
FATAL
status check.- only one guest thread currently in a ring-wait tests the shared
ALIVE
status bit directly, setting an internalatomic_bool alive
to match the last confirmed status. It also unsets theALIVE
status bit to be re-set by the renderer before the next test by this thread. - all other waiting guest threads indirectly check the ring health by testing the
atomic_bool alive
instead.
- only one guest thread currently in a ring-wait tests the shared
- If the renderer fails to report by setting the
ALIVE
status bit, the driver will callabort()
during the next "warn_order" iteration performed by the single monitoring guest thread.
See !21542 (932d80f3, comment 1807535) for earlier design discussion.
Related Changes:
- virglrenderer: virgl/virglrenderer!1066 (merged)
- venus-protocol: https://gitlab.freedesktop.org/olv/venus-protocol/-/merge_requests/65
Edited by Ryan Neph