When does GPU recovery kick in?
For a while I was wondering why I wasn't getting any GPU recovery attempts (even with gpu_recovery
set to 1
) and thought the feature was borked, always forcing a full reset (using a 6600 XT), but my friend (using a 7800 XT) sent me this dmesg log part today which kind of positively shocked me:
[33887.750255] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=23887795, emitted seq=23887797
[33887.750533] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 217422 thread firefox:cs0 pid 217496
[33887.750791] amdgpu 0000:28:00.0: amdgpu: GPU reset begin!
[33887.753031] amdgpu 0000:28:00.0: amdgpu: Guilty job already signaled, skipping HW reset
[33887.753057] amdgpu 0000:28:00.0: amdgpu: GPU reset(4) succeeded!
This was caused by Blender (multi-HIP crash), Firefox is innocent here, but nonetheless the KMD didn't force a full reset.
I've looked through amdgpu code, and didn't find anything suggesting some kind of blacklisting for RDNA2 and before.
Is this a feature for only RDNA3 and beyond, or is there something else going on (misconfigured kernel)?
These are my kernel bootparams: quiet splash amd_pstate=passive amdgpu.gpu_recovery=1 loglevel=3 amdgpu.ppfeaturemask=0xffffffff video=DP-2:1920x1080@165 video=HDMI-A-2:1920x1080@60 rd.udev.log_priority=3 acpi_enforce_resources=lax
, on linux 6.8.1-arch1.
Thanks in advance!