The source project of this merge request has been removed.
Async VM bind PoC
Put together a PoC based on @danvet 'Option 3: Recoverable vm' on the list with a few twists:
- No need for VM_UNBIND_FOR_MEMORY_RECLAIM flag, just don't set the async flag when VM unbind during reclaim
- Uses user memory + ufence to report VM bind error which kicks off reclaim (NIY, but super simple to do so, have uAPI header)
- async VM binds never actually fail (e.g. the user never needs to resubmit them), rather after the reclaim completion is signaled via XE_VM_BIND_OP_RESTART the errored async VM bind is retried
Implemented in last 11 patches which should tell a fairly clean story.
Mostly working, https://gitlab.freedesktop.org/drm/xe/igt-gpu-tools/-/merge_requests/4, passes or at least doesn't hang 85% of the time with async binds (100% non-hangs with sync binds). Not worried about the hang, just got to root cause.
Opens:
- @danvet getting a DRM_XE_SYNC_SYNCOBJ out of a VM bind didn't work without adding a async_op_fence to occupy syncobj fence slot. I know you have mentioned that syncobj support future fences, would that be the DRM_XE_SYNC_TIMELINE_SYNCOBJ semantic? If that works without async_op_fence (haven't tried this), perhaps we drop DRM_XE_SYNC_SYNCOBJ or disallow this with an async bind?
Edited by Matthew Brost