VM_BIND async ops alignment
The struct async_op_fence is a proxy dma_fence and following the extensive discussion on these on dri-devel, these are not allowed to exist. From the publishing of the fence to the moment the fence signals, all actions gating signalling (in effect the whole vm bind async worker) becomes the dma_fence signalling critical path, and should be annotated as such, and it's really extremely hard if not impossible for that worker to exist in a dma_fence signalling critical path due to extensive memory allocation and locking.
Now there is the xe_vm_async_fence_wait_start() to work around the problem locally, but since the fence is published and available to user-space, we need to obey the dma-fence rules.
IMHO, queuing up operations like this in a thread could be done in user-space. There has been some arch discussions that compute doesn't allow threads in the user-space drivers, probably to avoid over-committing cpus, but the effect will be just the same if the work item is created in kernel-space.
For the async vm operations, Can't we just do all prepare / stage operations sync (nothing should block this, right?), and then just hand out the fence we get from the async GPU page-table update. It will wait on in-syncs and do the right thing?