Skip to content

Async VM bind PoC

Matthew Brost requested to merge (removed):async_vm_bind_ops into xe

Put together a PoC based on @danvet 'Option 3: Recoverable vm' on the list with a few twists:

  • No need for VM_UNBIND_FOR_MEMORY_RECLAIM flag, just don't set the async flag when VM unbind during reclaim
  • Uses user memory + ufence to report VM bind error which kicks off reclaim (NIY, but super simple to do so, have uAPI header)
  • async VM binds never actually fail (e.g. the user never needs to resubmit them), rather after the reclaim completion is signaled via XE_VM_BIND_OP_RESTART the errored async VM bind is retried

Implemented in last 11 patches which should tell a fairly clean story.

Mostly working, https://gitlab.freedesktop.org/drm/xe/igt-gpu-tools/-/merge_requests/4, passes or at least doesn't hang 85% of the time with async binds (100% non-hangs with sync binds). Not worried about the hang, just got to root cause.

Opens:

  • @danvet getting a DRM_XE_SYNC_SYNCOBJ out of a VM bind didn't work without adding a async_op_fence to occupy syncobj fence slot. I know you have mentioned that syncobj support future fences, would that be the DRM_XE_SYNC_TIMELINE_SYNCOBJ semantic? If that works without async_op_fence (haven't tried this), perhaps we drop DRM_XE_SYNC_SYNCOBJ or disallow this with an async bind?
Edited by Matthew Brost

Merge request reports