radeonsi: Frequent hangs with firefox on RX 7900 XTX
Happens on both git from 9th of Jan and git from today (Jan 17th). Happens a couple of times per day, somewhat positively correlated with having gitlab loading circles.
[ 1668.656823] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=941383, emitted seq=941385
[ 1668.656981] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 20256 thread firefox:cs0 pid 20260
[ 1668.657095] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 1669.660136] amdgpu 0000:03:00.0: amdgpu: IP block:gfx_v11_0 is hung!
[ 1669.660345] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 1669.660351] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 1669.660353] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B5B
[ 1669.660354] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 1669.660356] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[ 1669.660357] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x5
[ 1669.660358] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 1669.660357] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d01a200 flags=0x0020]
[ 1669.660358] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 1669.660360] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[ 1669.660364] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 1669.660365] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d01a290 flags=0x0020]
[ 1669.660368] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 1669.660370] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 1669.660370] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d01a2c4 flags=0x0020]
[ 1669.660372] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 1669.660374] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d01a2f8 flags=0x0020]
[ 1669.660374] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 1669.660375] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 1669.660377] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 1669.660377] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d01a208 flags=0x0020]
[ 1669.660379] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 1669.660381] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[ 1669.660381] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d019200 flags=0x0020]
[ 1669.660384] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d019248 flags=0x0020]
[ 1669.660385] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 1669.660386] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d019280 flags=0x0020]
[ 1669.660387] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 1669.660389] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 1669.660389] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d0192b0 flags=0x0020]
[ 1669.660391] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 1669.660392] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 1669.660392] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x7d0192e4 flags=0x0020]
[ 1669.660393] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 1669.660395] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 1669.660396] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 1669.660397] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[ 1669.660401] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 1669.660402] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 1669.660403] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 1669.660404] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 1669.660405] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 1669.660406] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 1669.660407] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 1669.660407] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 1669.660408] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
from dmesg. I have the new firmware from Jan 10 linux-firmware.