6900xt locking up in games and occasional desktop usage
Brief summary of the problem:
My screens will all disconnect suddenly and the last split second of audio loops. The only way to fix it is to hard reset the machine with the power button.
Hardware description:
- CPU: Ryzen 5950x
- GPU: 6900xt reference
- System Memory: 64gb
- Display(s): 1x 1440p@144hz, 1x 1440p@60hz, 1x 1080p@60hz
- Type of Display Connection: DP for the 1440ps, USB-C->DP for the 1080p
System information:
- Distro name and Version: Arch Linux
- Kernel version: 5.17.1-zen
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
Use the desktop for a while, or play some games. I have been able to reproduce this reliably playing Black Mesa through Proton.
I tried vanilla non-zen kernel 5.17.2 and 5.14.15 but the same issue occurred. I have also not been able to get this to reproduce in windows.
I also tried setting amdgpu.runpm=0
which did not help either.
I would try a kernel bisection if I could find a version that works, but so far I can't.
I also purchased a new 6900xt direct from AMD instead of eBay this morning because I value my time more than my money, will update with results when that card shows up.
Log files (for system lockups / game freezes / crashes)
EDIT: These logs may have not actually captured the intended error. See the comment below for one that has it for sure.
Here is a full output of journalctl -b -1
: dmesg-dump.txt
Relevant output:
Apr 13 21:10:44 ryuko kernel: i2c-designware-pci 0000:08:00.3: can't change power state from D3cold to D0 (config space inaccessible)
Apr 13 21:10:44 ryuko kernel: i2c-designware-pci 0000:08:00.3: timeout in disabling adapter
Apr 13 21:10:45 ryuko kernel: i2c-designware-pci 0000:08:00.3: timeout in disabling adapter
Apr 13 21:10:49 ryuko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Apr 13 21:10:49 ryuko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Apr 13 21:10:49 ryuko kernel: nfs: server fileserver.lan not responding, still trying
Apr 13 21:10:50 ryuko kernel: nfs: server fileserver.lan OK
Apr 13 21:10:53 ryuko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Apr 13 21:10:53 ryuko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=19724, emitted seq=19726
Apr 13 21:10:53 ryuko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Apr 13 21:10:53 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
Apr 13 21:10:53 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
Apr 13 21:10:53 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: Failed to disable gfxoff!
Apr 13 21:10:53 ryuko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5014974, emitted seq=5014974
Apr 13 21:10:53 ryuko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process bms.exe pid 129325 thread dxvk-submit pid 129365
Apr 13 21:10:53 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
Apr 13 21:10:53 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: Bailing on TDR for s_job:4c2b06, as another already in progress
Apr 13 21:10:53 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:437
Apr 13 21:10:53 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:511
Apr 13 21:10:54 ryuko kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:445
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:519
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:469
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:543
Apr 13 21:10:54 ryuko kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:453
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:527
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:461
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:535
Apr 13 21:10:54 ryuko kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Apr 13 21:10:54 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000069da0b7c; ring_buffer_end = 00000000fb3a5ea9; write_frame = 00000000eeb97527
Apr 13 21:10:54 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 10us * 10020 tries - enc1_stream_encoder_dp_blank line:944
Apr 13 21:10:54 ryuko kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=5
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_disable_crtc line:528
Apr 13 21:10:54 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000069da0b7c; ring_buffer_end = 00000000fb3a5ea9; write_frame = 00000000eeb97527
Apr 13 21:10:54 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Apr 13 21:10:54 ryuko kernel: [drm] REG_WAIT timeout 10us * 10020 tries - enc1_stream_encoder_dp_blank line:944
Apr 13 21:10:54 ryuko kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=5
Apr 13 21:10:55 ryuko kernel: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_disable_crtc line:528
Apr 13 21:10:55 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000069da0b7c; ring_buffer_end = 00000000fb3a5ea9; write_frame = 00000000eeb97527
Apr 13 21:10:55 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Apr 13 21:10:55 ryuko kernel: [drm] REG_WAIT timeout 10us * 10020 tries - enc1_stream_encoder_dp_blank line:944
Apr 13 21:10:55 ryuko kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=5
Apr 13 21:10:55 ryuko kernel: amdgpu 0000:08:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Apr 13 21:10:55 ryuko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Apr 13 21:10:55 ryuko kernel: amdgpu 0000:08:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Apr 13 21:10:55 ryuko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Apr 13 21:10:56 ryuko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:7 param:0x00000000 message:DisableAllSmuFeatures?
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: Failed to disable smu features.
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: Fail to disable dpm features!
Apr 13 21:10:56 ryuko kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -121
Apr 13 21:10:56 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000069da0b7c; ring_buffer_end = 00000000fb3a5ea9; write_frame = 00000000eeb97527
Apr 13 21:10:56 ryuko kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Apr 13 21:10:56 ryuko kernel: [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate ras ta
Apr 13 21:10:56 ryuko kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: MODE1 reset
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: GPU mode1 reset
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: GPU smu mode1 reset
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:48 param:0x00000000 message:Mode1Reset?
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: GPU mode1 reset failed
Apr 13 21:10:56 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: ASIC reset failed with error, -121 for drm dev, 0000:08:00.0
Apr 13 21:10:57 ryuko kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:77:crtc-0] hw_done or flip_done timed out
Apr 13 21:11:02 ryuko kernel: amdgpu 0000:08:00.0: [drm] *ERROR* [CRTC:92:crtc-5] flip_done timed out
Apr 13 21:11:03 ryuko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5014974, emitted seq=5014974
Apr 13 21:11:03 ryuko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process bms.exe pid 129325 thread dxvk-submit pid 129365
Apr 13 21:11:03 ryuko kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset begin!