Screen freeze during boot RX 580
GPU: 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
System: Ubuntu 20.04
Kernel: 5.10.0-051000rc5
Mesa: 20.2.3
LLVM: 11
Driver: opesource with firmware copied from latest amdgpu-pro
My system freezes screen (with checkboard artifacts) during boot. Right after switching to gfx mode or later, on login screen or after login.
I tried various kernels from 5.0.0 to 5.10.0-051000rc5 - nothing changed. I installed win10 in dualboot on same machine. It works fine.
Here is dmesg after freeze
[ 390.193547] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 390.193552] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x09788801 for process code pid 5942 thread code:cs0 pid 5944
[ 390.193557] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0C69772F
[ 390.193559] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08088001
[ 390.193563] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 4, pasid 32771) at page 208238383, read from 'TC6' (0x54433600) (136)
[ 400.433561] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 410.673620] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 420.913594] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 605.234036] INFO: task code:5942 blocked for more than 120 seconds.
[ 605.234043] Not tainted 5.10.0-051000rc5-generic #202011221956
[ 605.234045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.234048] task:code state:D stack: 0 pid: 5942 ppid: 5911 flags:0x00004002
[ 605.234053] Call Trace:
[ 605.234060] ? usleep_range+0x90/0x90
[ 605.234065] __schedule+0x201/0x5d0
[ 605.234068] ? usleep_range+0x90/0x90
[ 605.234070] schedule+0x4f/0xc0
[ 605.234073] schedule_timeout+0xfe/0x140
[ 605.234078] ? dma_resv_test_signaled_rcu+0x5d/0x2f0
[ 605.234080] __wait_for_common+0xa8/0x150
[ 605.234083] wait_for_completion+0x24/0x30
[ 605.234091] drm_sched_entity_fini+0x40/0x100 [gpu_sched]
[ 605.234227] amdgpu_ctx_mgr_entity_fini+0x97/0xe0 [amdgpu]
[ 605.234340] amdgpu_ctx_mgr_fini+0x32/0xc0 [amdgpu]
[ 605.234442] amdgpu_driver_postclose_kms+0x159/0x230 [amdgpu]
[ 605.234468] drm_file_free.part.0+0xfc/0x180 [drm]
[ 605.234486] drm_close_helper.isra.0+0x65/0x70 [drm]
[ 605.234504] drm_release+0x6a/0x110 [drm]
[ 605.234507] __fput+0xa9/0x260
[ 605.234510] ____fput+0xe/0x10
[ 605.234513] task_work_run+0x6d/0xa0
[ 605.234516] do_exit+0x206/0x3c0
[ 605.234520] ? zap_other_threads+0x98/0xe0
[ 605.234522] do_group_exit+0x3b/0xb0
[ 605.234526] __x64_sys_exit_group+0x18/0x20
[ 605.234530] do_syscall_64+0x38/0x90
[ 605.234533] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 605.234536] RIP: 0033:0x7f78842cd2c6
[ 605.234538] RSP: 002b:00007fff6a897a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 605.234541] RAX: ffffffffffffffda RBX: 00002411b9e66da0 RCX: 00007f78842cd2c6
[ 605.234543] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 605.234545] RBP: 00007fff6a897a10 R08: 00000000000000e7 R09: fffffffffffffeb8
[ 605.234547] R10: 0000000000000000 R11: 0000000000000246 R12: 00002411bae86380
[ 605.234548] R13: 00007fff6a897bf0 R14: 0000000000000001 R15: 00007fff6a897bf4
I've tried various kernel parameters as well.
With amdgpu.dc=0
I have less change to get freeze but sill it happens either in 5-10 min after boot or when I launch some graphics intensive app.
with amdgpu.dc=0
I have following dmesg
[ 20.363823] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x0fb04802 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.363829] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001FCFF6
[ 20.363830] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E048002
[ 20.363833] amdgpu 0000:01:00.0: amdgpu: VM fault (0x02, vmid 7, pasid 32773) at page 2084854, read from 'TC4' (0x54433400) (72)
[ 20.575218] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x0fb80802 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.575224] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001FCFF7
[ 20.575225] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02008002
[ 20.575228] amdgpu 0000:01:00.0: amdgpu: VM fault (0x02, vmid 1, pasid 32773) at page 2084855, read from 'TC0' (0x54433000) (8)
[ 20.575258] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x0fd88802 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.575260] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001FCFFB
[ 20.575261] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02088002
[ 20.575263] amdgpu 0000:01:00.0: amdgpu: VM fault (0x02, vmid 1, pasid 32773) at page 2084859, read from 'TC6' (0x54433600) (136)
[ 20.602836] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0cf84404 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.602842] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00107B9F
[ 20.602844] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044004
[ 20.602847] amdgpu 0000:01:00.0: amdgpu: VM fault (0x04, vmid 4, pasid 32773) at page 1080223, read from 'TC5' (0x54433500) (68)
[ 20.602852] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x0a100402 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.602853] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00179F38
[ 20.602854] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08088002
[ 20.602857] amdgpu 0000:01:00.0: amdgpu: VM fault (0x02, vmid 4, pasid 32773) at page 1548088, read from 'TC6' (0x54433600) (136)
[ 20.603247] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x00f0c401 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.603250] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0F8F8A1E
[ 20.603251] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x080C4001
[ 20.603254] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 4, pasid 32773) at page 261065246, read from 'TC3' (0x54433300) (196)
[ 20.640690] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0b00480c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.640695] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010D760
[ 20.640696] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
[ 20.640699] amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 5, pasid 32773) at page 1103712, read from 'TC4' (0x54433400) (72)
[ 20.800774] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0b00a50c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 20.800779] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00104B60
[ 20.800781] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0A500C
[ 20.800784] amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 6, pasid 32773) at page 1067872, read from 'DBH4' (0x44424834) (165)
[ 23.142037] gmc_v8_0_process_interrupt: 2 callbacks suppressed
[ 23.142043] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x0ff08401 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142048] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08CFCFFE
[ 23.142050] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C084001
[ 23.142053] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 147836926, read from 'TC7' (0x54433700) (132)
[ 23.142058] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0c40040c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142060] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010D788
[ 23.142061] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0C800C
[ 23.142063] amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 6, pasid 32773) at page 1103752, read from 'TC2' (0x54433200) (200)
[ 23.142068] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0c40480c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142069] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E4
[ 23.142071] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C008001
[ 23.142073] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300196, read from 'TC0' (0x54433000) (8)
[ 23.142077] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0c40440c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142079] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E6
[ 23.142080] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0C4001
[ 23.142082] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300198, read from 'TC3' (0x54433300) (196)
[ 23.142087] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0c40080c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142088] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028DE
[ 23.142089] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C008001
[ 23.142091] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300190, read from 'TC0' (0x54433000) (8)
[ 23.142096] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0c40880c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142097] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E0
[ 23.142098] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0C4001
[ 23.142101] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300192, read from 'TC3' (0x54433300) (196)
[ 23.142105] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0c40840c for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142106] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E2
[ 23.142108] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C004001
[ 23.142110] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300194, read from 'TC1' (0x54433100) (4)
[ 23.142114] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x06c04801 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142116] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E6
[ 23.142117] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048001
[ 23.142119] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300198, read from 'TC4' (0x54433400) (72)
[ 23.142123] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x06d00401 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142125] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E8
[ 23.142126] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C008001
[ 23.142128] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300200, read from 'TC0' (0x54433000) (8)
[ 23.142132] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 147 0x06c04401 for process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 23.142134] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FD028E5
[ 23.142135] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C088001
[ 23.142137] amdgpu 0000:01:00.0: amdgpu: VM fault (0x01, vmid 6, pasid 32773) at page 265300197, read from 'TC6' (0x54433600) (136)
[ 31.791825] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
[ 40.249779] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2230, emitted seq=2233
[ 40.249927] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2586 thread gnome-shel:cs0 pid 2605
[ 40.249933] amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
[ 40.688256] amdgpu: cp is busy, skip halt cp
[ 40.878942] amdgpu: rlc is busy, skip halt rlc
[ 40.879959] amdgpu 0000:01:00.0: amdgpu: BACO reset
[ 41.166537] amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 41.167264] [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
[ 41.167275] [drm] VRAM is lost due to GPU reset!
[ 42.310856] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 43.323091] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 44.335327] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 45.347596] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 46.359851] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 47.372090] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 48.384329] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 49.396557] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 50.408787] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 51.421011] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 51.440973] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, giving up!!!
[ 51.441064] [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <uvd_v6_0> failed -1
[ 51.648936] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd test failed (-110)
[ 51.649026] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110
[ 51.649050] amdgpu 0000:01:00.0: amdgpu: GPU reset(2) failed
[ 51.669977] amdgpu 0000:01:00.0: amdgpu: GPU reset end with ret = -110
[ 61.743779] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 71.983613] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
I googled really hard to find the solution but have found only opened issues with similar error logs. Can provide any additional info if needed.