AMDGPU driver crash when using embedded gamescope + steam remote play
Brief summary of the problem:
I usually run gamescope-session as my main UI for playing. For a while the streaming from Steam to other devices has been non-functional because of this bug. After patching gamescope with this patch the segfault in gamescope was fixed.
Now I'm facing this issue that seems to be driver related. Trying to stream from my machine to my phone or to my laptoip using Steam Link app or steam itself (via STREAM button on a game) results in a driver crash. The ystem is able to be reached via SSH but can´t be rebooted because of a full system hang that needs to be hard restarted (or shut off holding the power button).
This is the dmesg just after the crash:
[ 112.079995] amdgpu 0000:08:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32775, for process steam pid 962 thread steam:cs0 pid 1087)
[ 112.080000] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x000080010672a000 from client 0x12 (VMC)
[ 112.080003] amdgpu 0000:08:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631
[ 112.080004] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0x2b)
[ 112.080005] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x1
[ 112.080006] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x0
[ 112.080007] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 112.080008] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 112.080008] amdgpu 0000:08:00.0: amdgpu: RW: 0x0
[ 112.080011] amdgpu 0000:08:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32775, for process steam pid 962 thread steam:cs0 pid 1087)
[ 112.080013] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x000080010672b000 from client 0x12 (VMC)
[ 112.080015] amdgpu 0000:08:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 112.080016] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x0)
[ 112.080016] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x0
[ 112.080017] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x0
[ 112.080018] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 112.080018] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 112.080019] amdgpu 0000:08:00.0: amdgpu: RW: 0x0
[ 122.101998] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_enc_0.0 timeout, signaled seq=260, emitted seq=261 [ 122.102250] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process steam pid 962 thread steam:cs0 pid 1087
[ 122.102473] amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
As the report on Gamescope's Github suggests more than one generation of GPUs seem affected. I could not reproduce the error in my laptop that has an AMD APU (Ryzen PRO 5850U). Even ther the issue is not present and streaming works.
Hardware description:
- CPU: AMD 5600X
- GPU: 08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
- System Memory: 32GB
- Display(s): LG 4k TV
- Type of Display Connection: HDMI
System information:
- Distro name and Version: Archlinux
- Kernel version: 6.4.1-arch2 and 6.4.1-zen2
- Custom kernel: Also tested with linux-chimeraos 6.3.9-chos2 from here
- AMD official driver version: mesa 23.1.3
How to reproduce the issue:
- patch latest gamescope master with this patch
- open a tty and run steam on gamescope embedded:
$ gamescope -e -- steam -gamepadui
- From another machine with steam or with a phone with SteamLink app try to stream from the first machine.
Log files (for system lockups / game freezes / crashes)
- Dmesg log: dmesg.log
- Gamescope stdout: gamescope-stdout.log