From 90185f3a561b59a3f15d62d7c662db5e3a0e64db Mon Sep 17 00:00:00 2001
From: Lijo Lazar <lijo.lazar@amd.com>
Date: Mon, 9 Dec 2024 09:14:53 +0530
Subject: [PATCH] drm/amdgpu: Avoid VF for RAS recovery source check

VF device sets the RAS flag when mailbox data can't be read properly.
There is no conclusive way to tell if the real source is RAS error.
Therefore VF schedules a KFD based reset which doesn't set RAS source.
SKip checking RAS source for any VF scheduled recovery.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Vojislav Tomasevic <vojislav.tomasevic@amd.com>
Reviewed-by: Yiqing Yao <yiqing.yao@amd.com>
Tested-by: Yiqing Yao <yiqing.yao@amd.com>

Fixes: 2211660c20a0 ("drm/amdgpu: Prefer RAS recovery for scheduler hang")
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d1bb9e85b6d73..b5e5c790180ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5801,6 +5801,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * detected at the same time, let RAS recovery take care of it.
 	 */
 	if (amdgpu_ras_is_err_state(adev, AMDGPU_RAS_BLOCK__ANY) &&
+	    !amdgpu_sriov_vf(adev) &&
 	    reset_context->src != AMDGPU_RESET_SRC_RAS) {
 		dev_dbg(adev->dev,
 			"Gpu recovery from source: %d yielding to RAS error recovery handling",
-- 
GitLab