[Renoir] [Cezanne] Random Freezes and Black-screen
Brief summary of the problem:
Hi, Since at least Kernel 6.2.1 I experience random freezes with admgpu as soon as I do "heavier" work, e.g rendering google-maps 3D. Usually the system freezes and the a black screen starts flashing. I need to reboot my system in order to get it working again. DRM Logs cycle between these entries:
Mär 06 12:53:59 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=287925, emitted seq=287927
Mär 06 12:53:59 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chrome pid 6163 thread chrome:cs0 pid 6278
Mär 06 12:53:59 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Mär 06 12:54:00 aurora kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
Mär 06 12:54:00 aurora kernel: amdgpu 0000:07:00.0: amdgpu: MODE2 reset
Mär 06 12:54:00 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
Mär 06 12:54:00 aurora kernel: [drm] PCIE GART of 1024M enabled.
Mär 06 12:54:00 aurora kernel: [drm] PTB located at 0x000000F41FC00000
Mär 06 12:54:00 aurora kernel: [drm] PSP is resuming...
Mär 06 12:54:00 aurora kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
Mär 06 12:54:01 aurora kernel: [drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
Mär 06 12:54:01 aurora kernel: [drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: Secure display: Generic Failure.
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully!
Mär 06 12:54:01 aurora kernel: [drm] DMUB hardware initialized: version=0x01010026
Mär 06 12:54:01 aurora kernel: [drm] kiq ring mec 2 pipe 1 q 0
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Mär 06 12:54:01 aurora kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
Mär 06 12:54:01 aurora kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(2) failed
Mär 06 12:54:01 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
Mär 06 12:54:01 aurora kernel: [drm] Skip scheduling IBs!
Mär 06 12:54:01 aurora kernel: [drm] Skip scheduling IBs!
Mär 06 12:54:01 aurora kernel: [drm] Skip scheduling IBs!
Mär 06 12:54:01 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Mär 06 12:54:12 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=44790, emitted seq=44792
Mär 06 12:54:12 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Mär 06 12:54:12 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Mär 06 12:54:12 aurora kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
Mär 06 12:54:12 aurora kernel: amdgpu 0000:07:00.0: amdgpu: MODE2 reset
Mär 06 12:54:12 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
Mär 06 12:54:12 aurora kernel: [drm] PCIE GART of 1024M enabled.
Mär 06 12:54:12 aurora kernel: [drm] PTB located at 0x000000F41FC00000
Mär 06 12:54:12 aurora kernel: [drm] PSP is resuming...
Mär 06 12:54:13 aurora kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
Mär 06 12:54:13 aurora kernel: [drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
Mär 06 12:54:13 aurora kernel: [drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: Secure display: Generic Failure.
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully!
Mär 06 12:54:13 aurora kernel: [drm] DMUB hardware initialized: version=0x01010026
Mär 06 12:54:13 aurora kernel: [drm] kiq ring mec 2 pipe 1 q 0
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Mär 06 12:54:13 aurora kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
Mär 06 12:54:13 aurora kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(3) failed
Mär 06 12:54:13 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
Mär 06 12:54:13 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Mär 06 12:54:24 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=44792, emitted seq=44794
Mär 06 12:54:24 aurora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Mär 06 12:54:24 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Mär 06 12:54:24 aurora kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
Mär 06 12:54:24 aurora kernel: amdgpu 0000:07:00.0: amdgpu: MODE2 reset
Mär 06 12:54:24 aurora kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
Mär 06 12:54:24 aurora kernel: [drm] PCIE GART of 1024M enabled.
Mär 06 12:54:24 aurora kernel: [drm] PTB located at 0x000000F41FC00000
Mär 06 12:54:24 aurora kernel: [drm] PSP is resuming...
Mär 06 12:54:25 aurora kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Mär 06 12:54:25 aurora kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
Mär 06 12:54:25 aurora kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
Right now I am on 6.1.11 and running fine.
Hardware description:
- CPU: AMD Ryzen 7 PRO 4750U with Radeon Graphics
- GPU: 07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d1)
- System Memory: 32G
- Display(s): 1x 5k LG HDR, 1x1440p iiyama
- Type of Display Connection: 1x USB-C, 1x USB-C to HDMI.
System information:
- Distro name and Version: Arch
- Kernel version: 6.2.2-zen1-1-zen
- Custom kernel: 6.2.2-zen1-1-zen
- AMD official driver version: N/A
How to reproduce the issue:
It seems to happen randomly. However, it happens more often performing gpu intense tasks like video-decoding oder rendering google maps 3D-View.
Attached files:
Log files (for system lockups / game freezes / crashes)
JournalCTL: https://pastebin.com/GypLMP02