AMDGPU page fault crashes desktop
Brief summary of the problem:
I’m experiencing this issue only when playing AC Odyssey through Steam Proton. Could also be on any other game running proton, but I have no more examples. It starts with whole system freezing, leaving me only an option to move the cursor. Nothing is clickable and keyboard shortcuts are not working. After a second or two, both screens turn black for a second, then come back but nothing changes. The audio is working as if nothing happened when screens are not black. I am able to switch to TTY after screens are not black again, but can’t do anything to fix the issue. The only option is hard reboot.
It may happen after 3 minutes of playtime, or may not happen at all. Trying to fix this I have tried this, but it didn’t help and it doesn’t seem like some specific voltage/power consumption/temp causes this.
I have also set amdgpu.runpm=0 in kernel parameters.
Hardware description:
- CPU: AMD Ryzen 5 3600
- GPU: Radeon RX 6600 XT
- System Memory: 4x8 GB DDR4 3600MHz
- Displays: 1 DELL 1080p 60Hz (DisplayPort) + 1 LG 1080p 60Hz (HDMI->DVI)
System information:
- Distro name and Version: Manjaro KDE 22.0.0
- Kernel version: 5.19.16-2-MANJARO
How to reproduce the issue:
- Run somewhat heavy games through Proton (possibly just Vulkan)
- Wait
Attached files:
journald output
21.10.2022 17:17:54:251 kernel amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32803, for process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294)
21.10.2022 17:17:54:254 kernel amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000000193f24000 from client 0x1b (UTCL2)
21.10.2022 17:17:54:254 kernel amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701431
21.10.2022 17:17:54:255 kernel amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa)
21.10.2022 17:17:54:255 kernel amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1
21.10.2022 17:17:54:255 kernel amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0
21.10.2022 17:17:54:255 kernel amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x3
21.10.2022 17:17:54:255 kernel amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: RW: 0x0
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32803, for process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294)
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000000193f24000 from client 0x1b (UTCL2)
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x0
21.10.2022 17:17:54:256 kernel amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0
21.10.2022 17:17:54:257 kernel amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x0
21.10.2022 17:17:54:257 kernel amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0
21.10.2022 17:17:54:258 kernel amdgpu 0000:09:00.0: amdgpu: RW: 0x0
21.10.2022 17:17:54:258 kernel amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32803, for process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294)
21.10.2022 17:17:54:258 kernel amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000000002d4d000 from client 0x1b (UTCL2)
21.10.2022 17:17:54:259 kernel amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
21.10.2022 17:17:54:259 kernel amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
21.10.2022 17:17:54:259 kernel amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x0
21.10.2022 17:17:54:259 kernel amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0
21.10.2022 17:17:54:260 kernel amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x0
21.10.2022 17:17:54:260 kernel amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0
21.10.2022 17:17:54:260 kernel amdgpu 0000:09:00.0: amdgpu: RW: 0x0
21.10.2022 17:17:57:458 kernel [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13314677, emitted seq=13314679
21.10.2022 17:17:57:458 kernel [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294
21.10.2022 17:18:01:458 kernel [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
21.10.2022 17:18:01:918 kernel amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
21.10.2022 17:18:01:918 kernel [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
21.10.2022 17:18:02:184 kernel [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
21.10.2022 17:18:02:315 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:316 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317 kernel snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:03:229 kernel [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:238 kernel [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:247 kernel [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:247 kernel [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:247 kernel [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:254 kernel [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
.
.
.