Skip to content

Basic GuC submission, job cancellation, engine reset, GT reset, and a bunch of debug features

Matthew Brost requested to merge (removed):xe into xe

Versions 2 details.

The design of the reset flows are kick banning of the context to the TDR regardless of source (TDR itself, G2H indicating engine reset, or a GT reset). IMO is a really nice design as we basically just have to maintain the 1 path + routing through DRM scheduler is nice. This path also can be leveraged when context is closed and we don't want persistence (it should 1 function call and DRM scheduler / GuC backend cleans up the engine with same flow).

GT reset flow is roughly:

  • Stop all entry points GuC backend, cleanup each engine for any lost state (G2H), ban if needed
  • Reset GPU + bring HW back up
  • Respin engines as needed, unblock entry points to GuC backend

GT reset tested with 10k+ engine submissions in flight (see IGT MR below).

State transitions of engines are lockless due to mutual exclusion properties of GuC backend (annotations for this are missing, will do in follow up).

Also included is debug features like debugfs entries and tracing.

Tested with with MR in the IGTs (more details in that MR): https://gitlab.freedesktop.org/drm/xe/igt-gpu-tools/-/merge_requests/1

Series is kinda mess - later patches may delete / change code from earlier patches but since we don't care about history... Plan is to squash it down before merge in 1 large patch aside from the DRM scheduler changes which will be broken out into individual patches. With that, maybe read the commit messages but mainly look at the full diff.

In no particular order what's next:

  • Remove HWE usage from submission paths
  • Virtual engines (goes hand in hand /w removing HWE)
  • Fixup layering (basically not if guc do this else in any paths expect from init)
  • Add vfuncs to write rings (e.g. include flushes before BBs, etc...)
  • Add fast path to CTs which drops lock during CT processing + slow path with holds lock
  • Parallel submission
  • Start expanding IGTs
Edited by Matthew Brost

Merge request reports

Loading