6600XT failure to resume from suspend
Hardware description:
- CPU: i5-6600K
- GPU: 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c1)
- System Memory: 32GB
- Display(s): 2x QHD, both using display port
System information:
- arch linux
- kernel 6.1.30 lts
- mesa 23.0.3
How to reproduce the issue:
6600XT failure to resume from suspend, not 100% reproducible. Will usually happen within 2-4 days of up-time. Usually you can tell when the failure will happen as the PSU fan will continue to spin while suspended.
Some snippets from the logs after failure. The system still responds to sysRq, but the display is blank.
amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
---
May 30 07:58:16 abberation kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000036 SMN_C2PMSG_82:0x00000000
May 30 07:58:16 abberation kernel: amdgpu 0000:03:00.0: amdgpu: RunDcBtc failed!
May 30 07:58:16 abberation kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
May 30 07:58:16 abberation kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
May 30 07:58:16 abberation kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
May 30 07:58:16 abberation kernel: amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xf0 returns -62
May 30 07:58:16 abberation kernel: amdgpu 0000:03:00.0: PM: failed to resume async: error -62
---
May 30 07:58:16 abberation kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
May 30 07:58:29 abberation kernel: [drm] perform_link_training_with_retries: Link(1) training attempt 1 of 4 failed @ rate(20) x lane(4) : fail reason:(1)
---
May 30 07:58:32 abberation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4147575, emitted seq=4147577
May 30 07:58:32 abberation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Possibly related to #2258