drm/amdgpu: Fix the race condition for draining retry fault
Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continues execution and reaches svm_range_from_addr. This results in a "failed to find prange..." error, causing the page recovery to fail. How to fix: Move the timestamp check code under the protection of svm->lock. v2: Make sure all right locks are released before go out. v3: Directly goto out_unlock_svms, and return -EAGAIN. v4: Refine code. Signed-off-by:Emily Deng <Emily.Deng@amd.com> Reviewed-by:
Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
Loading
Please register or sign in to comment