radeonsi: OpenGL app always produces page fault in gfxhub on Navi 10
The Windows OpenGL VR application Welcome to Light Fields produces a page fault in gfxhub and locks up the GPU on Radeon RX 5700 XT, every time it is launched.
VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5700 / 5700 XT] (rev c1)
.
The problem is reproducible with the playback of an apitrace.
Playing back the trace produces the lock up 100% of the times. I was able to bisect the trace to the faulty call and furthermore trim down the trace to 1655 calls / 63kb and still reproduce the GPU hang. The calls relevant to the hang were after the last context switch.
The trace does not produce the page fault on my RX580.
Tried stable kernel 5.3.8.1
from Arch Linux, tag v5.4-rc6
, drm-next
at 8a86b00a437e and amd-staging-drm-next
at 8799b4cfde62. All kernel versions behave the same in this issue.
Tried mesa 19.2.2
and 20.0
from git tag dd77bdb34b6
, which behaves the same.
The Windows application was run with Proton 4.11-7 for compatibility with the OpenVR API.
Mostly unrelated but interesting: A trace captured on the RX580, where the application menu opens, does not produce the page fault on the RX 5700 XT. A trace captured on Windows with the RX 5700 XT does not produce the page fault on Linux.
dmesg:
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e908000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e900000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e900000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e904000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e910000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e904000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e901000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e914000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e934000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x1
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32771, for process glretrace pid 2001 thread glretrace:cs0 pid 2002)
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: in page starting at address 0x000080010e901000 from client 27
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MORE_FAULTS: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0
Nov 04 17:58:35 bstation kernel: amdgpu 0000:0b:00.0: RW: 0x0
Nov 04 17:58:40 bstation kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 04 17:58:45 bstation kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 04 17:58:45 bstation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=27521, emitted seq=27523
Nov 04 17:58:45 bstation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Nov 04 17:58:45 bstation kernel: [drm] GPU recovery disabled.