Random GPU hangs after latest amdgpu firmware update [7900 XTX]
After updating amdgpu firmware to recent commits from 2024-01-23 (see here), I started getting random GPU hangs when using KDE Plasma system monitor.
- GPU: Sapphire Nitro+ 7900 XTX
- Kernel: 6.7.2
- OpenGL: Mesa 23.3.3-3, AMD Radeon RX 7900 XTX (radeonsi, navi31, LLVM 17.0.6, DRM 3.56, 6.7.2)
- Vulkan: Mesa 23.3.3-3, radv
- KDE Plasma 5.27.10, Wayland session
Snippet from dmesg (full log: gpu_hang.log).
[ 1309.683293] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=308568, emitted seq=308570
[ 1309.683410] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasma-systemmo pid 4350 thread plasma-sys:cs0 pid 4353
[ 1309.683492] amdgpu 0000:48:00.0: amdgpu: GPU reset begin!
[ 1310.697277] amdgpu 0000:48:00.0: amdgpu: IP block:gfx_v11_0 is hung!
[ 1310.697410] amdgpu 0000:48:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 1310.697417] amdgpu 0000:48:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 1310.697421] amdgpu 0000:48:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B53
[ 1310.697423] amdgpu 0000:48:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 1310.697425] amdgpu 0000:48:00.0: amdgpu: MORE_FAULTS: 0x1
[ 1310.697427] amdgpu 0000:48:00.0: amdgpu: WALKER_ERROR: 0x1
[ 1310.697429] amdgpu 0000:48:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 1310.697430] amdgpu 0000:48:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 1310.697432] amdgpu 0000:48:00.0: amdgpu: RW: 0x1
It works fine with the firmware commit before all these updates.
Edited by Shmerl