Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
For reproduction issue:
1) # dnf install gstreamer1-vaapi
2) Play video encoded with H.264 in Totem player
Symptoms:
1. The system stop to respod.
kernel output after GPU hang:
[ 89.056879] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2638, last emitted seq=2640
[ 89.056926] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring uvd timeout, last signaled seq=80, last emitted seq=82
[ 89.056932] [drm] No hardware hang detected. Did some blocks stall?
[ 89.056948] [drm] No hardware hang detected. Did some blocks stall?
If do not restart the computer and leave it in a hang state, then after a while the turbine starts spinning at full speed, and the LEDs on the video card all go out.
I was even frightened. reboot through the reset button did not help the turbine continued to make noise, and the LED on the video card did not catch fire.
Only after turning off the computer it was possible to restore the working of the video card.
Here is that was logged in dmesg at this time:
[247125.285043] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, last signaled seq=20977028, last emitted seq=20977030
[247125.285083] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring uvd timeout, last signaled seq=30, last emitted seq=31
[247125.285085] [drm] No hardware hang detected. Did some blocks stall?
[247125.285087] [drm] No hardware hang detected. Did some blocks stall?
[247359.270184] INFO: task amdgpu_cs:0:21382 blocked for more than 120 seconds.
[247359.270188] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247359.270190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247359.270191] amdgpu_cs:0 D12728 21382 21309 0x00000000
[247359.270196] Call Trace:
[247359.270203] ? __schedule+0x2ba/0xaf0
[247359.270220] ? dma_fence_default_wait+0x231/0x370
[247359.270222] schedule+0x2f/0x90
[247359.270235] schedule_timeout+0x35c/0x520
[247359.270238] ? dma_fence_default_wait+0x72/0x370
[247359.270242] ? dma_fence_default_wait+0x231/0x370
[247359.270245] dma_fence_default_wait+0x25d/0x370
[247359.270247] ? dma_fence_release+0x160/0x160
[247359.270251] dma_fence_wait_timeout+0x4f/0x270
[247359.270300] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu]
[247359.270325] amdgpu_cs_ioctl+0x9d/0x1d10 [amdgpu]
[247359.270356] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247359.270368] drm_ioctl_kernel+0x5b/0xb0 [drm]
[247359.270375] drm_ioctl+0x1b3/0x370 [drm]
[247359.270397] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247359.270420] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[247359.270424] do_vfs_ioctl+0xa5/0x6d0
[247359.270428] ksys_ioctl+0x60/0x90
[247359.270431] __x64_sys_ioctl+0x16/0x20
[247359.270434] do_syscall_64+0x60/0x1f0
[247359.270438] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[247359.270545] INFO: task amdgpu_cs:0:12186 blocked for more than 120 seconds.
[247359.270546] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247359.270548] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247359.270550] amdgpu_cs:0 D13400 12186 12133 0x00000000
[247359.270554] Call Trace:
[247359.270557] ? __schedule+0x2ba/0xaf0
[247359.270561] ? dma_fence_default_wait+0x231/0x370
[247359.270564] schedule+0x2f/0x90
[247359.270566] schedule_timeout+0x35c/0x520
[247359.270569] ? dma_fence_default_wait+0x72/0x370
[247359.270573] ? dma_fence_default_wait+0x231/0x370
[247359.270575] dma_fence_default_wait+0x25d/0x370
[247359.270577] ? dma_fence_release+0x160/0x160
[247359.270580] dma_fence_wait_timeout+0x4f/0x270
[247359.270604] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu]
[247359.270626] amdgpu_cs_ioctl+0x9d/0x1d10 [amdgpu]
[247359.270656] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247359.270665] drm_ioctl_kernel+0x5b/0xb0 [drm]
[247359.270672] drm_ioctl+0x1b3/0x370 [drm]
[247359.270692] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247359.270713] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[247359.270717] do_vfs_ioctl+0xa5/0x6d0
[247359.270721] ksys_ioctl+0x60/0x90
[247359.270724] __x64_sys_ioctl+0x16/0x20
[247359.270727] do_syscall_64+0x60/0x1f0
[247359.270730] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[247359.270886] INFO: task kworker/u16:1:16581 blocked for more than 120 seconds.
[247359.270887] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247359.270889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247359.270890] kworker/u16:1 D10936 16581 2 0x80000000
[247359.270905] Workqueue: events_unbound commit_work [drm_kms_helper]
[247359.270907] Call Trace:
[247359.270910] ? __schedule+0x2ba/0xaf0
[247359.270914] ? dma_fence_default_wait+0x231/0x370
[247359.270916] schedule+0x2f/0x90
[247359.270919] schedule_timeout+0x35c/0x520
[247359.270922] ? dma_fence_default_wait+0x72/0x370
[247359.270925] ? dma_fence_default_wait+0x231/0x370
[247359.270927] dma_fence_default_wait+0x25d/0x370
[247359.270929] ? dma_fence_release+0x160/0x160
[247359.270932] dma_fence_wait_timeout+0x4f/0x270
[247359.270935] reservation_object_wait_timeout_rcu+0x236/0x4e0
[247359.270967] amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[247359.271003] amdgpu_dm_atomic_commit_tail+0xa76/0xd00 [amdgpu]
[247359.271008] ? wait_for_completion_timeout+0x73/0x1a0
[247359.271019] commit_tail+0x3d/0x70 [drm_kms_helper]
[247359.271025] process_one_work+0x261/0x630
[247359.271030] worker_thread+0x3a/0x390
[247359.271033] ? process_one_work+0x630/0x630
[247359.271036] kthread+0x120/0x140
[247359.271039] ? kthread_create_worker_on_cpu+0x70/0x70
[247359.271041] ret_from_fork+0x3a/0x50
[247359.271056] INFO: lockdep is turned off.
[247482.151777] INFO: task amdgpu_cs:0:21382 blocked for more than 120 seconds.
[247482.151781] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247482.151782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247482.151784] amdgpu_cs:0 D12728 21382 21309 0x00000000
[247482.151788] Call Trace:
[247482.151796] ? __schedule+0x2ba/0xaf0
[247482.151799] ? dma_fence_default_wait+0x231/0x370
[247482.151802] schedule+0x2f/0x90
[247482.151804] schedule_timeout+0x35c/0x520
[247482.151807] ? dma_fence_default_wait+0x72/0x370
[247482.151810] ? dma_fence_default_wait+0x231/0x370
[247482.151812] dma_fence_default_wait+0x25d/0x370
[247482.151814] ? dma_fence_release+0x160/0x160
[247482.151817] dma_fence_wait_timeout+0x4f/0x270
[247482.151863] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu]
[247482.151884] amdgpu_cs_ioctl+0x9d/0x1d10 [amdgpu]
[247482.151912] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247482.151924] drm_ioctl_kernel+0x5b/0xb0 [drm]
[247482.151932] drm_ioctl+0x1b3/0x370 [drm]
[247482.151952] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247482.151973] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[247482.151977] do_vfs_ioctl+0xa5/0x6d0
[247482.151982] ksys_ioctl+0x60/0x90
[247482.151986] __x64_sys_ioctl+0x16/0x20
[247482.151989] do_syscall_64+0x60/0x1f0
[247482.151993] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[247482.152121] INFO: task amdgpu_cs:0:12186 blocked for more than 120 seconds.
[247482.152123] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247482.152124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247482.152126] amdgpu_cs:0 D13400 12186 12133 0x00000000
[247482.152130] Call Trace:
[247482.152143] ? __schedule+0x2ba/0xaf0
[247482.152146] ? dma_fence_default_wait+0x231/0x370
[247482.152148] schedule+0x2f/0x90
[247482.152150] schedule_timeout+0x35c/0x520
[247482.152153] ? dma_fence_default_wait+0x72/0x370
[247482.152156] ? dma_fence_default_wait+0x231/0x370
[247482.152169] dma_fence_default_wait+0x25d/0x370
[247482.152171] ? dma_fence_release+0x160/0x160
[247482.152174] dma_fence_wait_timeout+0x4f/0x270
[247482.152203] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu]
[247482.152233] amdgpu_cs_ioctl+0x9d/0x1d10 [amdgpu]
[247482.152281] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247482.152299] drm_ioctl_kernel+0x5b/0xb0 [drm]
[247482.152316] drm_ioctl+0x1b3/0x370 [drm]
[247482.152335] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247482.152375] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[247482.152379] do_vfs_ioctl+0xa5/0x6d0
[247482.152382] ksys_ioctl+0x60/0x90
[247482.152385] __x64_sys_ioctl+0x16/0x20
[247482.152387] do_syscall_64+0x60/0x1f0
[247482.152390] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[247482.152554] INFO: task kworker/u16:1:16581 blocked for more than 120 seconds.
[247482.152556] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247482.152558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247482.152560] kworker/u16:1 D10936 16581 2 0x80000000
[247482.152571] Workqueue: events_unbound commit_work [drm_kms_helper]
[247482.152574] Call Trace:
[247482.152579] ? __schedule+0x2ba/0xaf0
[247482.152584] ? dma_fence_default_wait+0x231/0x370
[247482.152587] schedule+0x2f/0x90
[247482.152590] schedule_timeout+0x35c/0x520
[247482.152594] ? dma_fence_default_wait+0x72/0x370
[247482.152599] ? dma_fence_default_wait+0x231/0x370
[247482.152603] dma_fence_default_wait+0x25d/0x370
[247482.152606] ? dma_fence_release+0x160/0x160
[247482.152610] dma_fence_wait_timeout+0x4f/0x270
[247482.152615] reservation_object_wait_timeout_rcu+0x236/0x4e0
[247482.152651] amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[247482.152691] amdgpu_dm_atomic_commit_tail+0xa76/0xd00 [amdgpu]
[247482.152713] ? wait_for_completion_timeout+0x73/0x1a0
[247482.152721] commit_tail+0x3d/0x70 [drm_kms_helper]
[247482.152725] process_one_work+0x261/0x630
[247482.152732] worker_thread+0x3a/0x390
[247482.152735] ? process_one_work+0x630/0x630
[247482.152737] kthread+0x120/0x140
[247482.152740] ? kthread_create_worker_on_cpu+0x70/0x70
[247482.152742] ret_from_fork+0x3a/0x50
[247482.152751] INFO: lockdep is turned off.
[247605.031356] INFO: task amdgpu_cs:0:21382 blocked for more than 120 seconds.
[247605.031360] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247605.031362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247605.031364] amdgpu_cs:0 D12728 21382 21309 0x00000000
[247605.031369] Call Trace:
[247605.031376] ? __schedule+0x2ba/0xaf0
[247605.031381] ? dma_fence_default_wait+0x231/0x370
[247605.031383] schedule+0x2f/0x90
[247605.031386] schedule_timeout+0x35c/0x520
[247605.031389] ? dma_fence_default_wait+0x72/0x370
[247605.031393] ? dma_fence_default_wait+0x231/0x370
[247605.031396] dma_fence_default_wait+0x25d/0x370
[247605.031398] ? dma_fence_release+0x160/0x160
[247605.031401] dma_fence_wait_timeout+0x4f/0x270
[247605.031439] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu]
[247605.031467] amdgpu_cs_ioctl+0x9d/0x1d10 [amdgpu]
[247605.031512] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247605.031525] drm_ioctl_kernel+0x5b/0xb0 [drm]
[247605.031543] drm_ioctl+0x1b3/0x370 [drm]
[247605.031566] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247605.031590] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[247605.031596] do_vfs_ioctl+0xa5/0x6d0
[247605.031600] ksys_ioctl+0x60/0x90
[247605.031603] __x64_sys_ioctl+0x16/0x20
[247605.031606] do_syscall_64+0x60/0x1f0
[247605.031611] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[247605.031715] INFO: task amdgpu_cs:0:12186 blocked for more than 120 seconds.
[247605.031717] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247605.031718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247605.031720] amdgpu_cs:0 D13400 12186 12133 0x00000000
[247605.031725] Call Trace:
[247605.031729] ? __schedule+0x2ba/0xaf0
[247605.031733] ? dma_fence_default_wait+0x231/0x370
[247605.031735] schedule+0x2f/0x90
[247605.031738] schedule_timeout+0x35c/0x520
[247605.031741] ? dma_fence_default_wait+0x72/0x370
[247605.031744] ? dma_fence_default_wait+0x231/0x370
[247605.031746] dma_fence_default_wait+0x25d/0x370
[247605.031749] ? dma_fence_release+0x160/0x160
[247605.031752] dma_fence_wait_timeout+0x4f/0x270
[247605.031775] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu]
[247605.031798] amdgpu_cs_ioctl+0x9d/0x1d10 [amdgpu]
[247605.031828] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247605.031838] drm_ioctl_kernel+0x5b/0xb0 [drm]
[247605.031846] drm_ioctl+0x1b3/0x370 [drm]
[247605.031866] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[247605.031887] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[247605.031892] do_vfs_ioctl+0xa5/0x6d0
[247605.031896] ksys_ioctl+0x60/0x90
[247605.031899] __x64_sys_ioctl+0x16/0x20
[247605.031902] do_syscall_64+0x60/0x1f0
[247605.031906] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[247605.032047] INFO: task kworker/u16:1:16581 blocked for more than 120 seconds.
[247605.032049] Not tainted 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 (closed)
[247605.032050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[247605.032052] kworker/u16:1 D10936 16581 2 0x80000000
[247605.032063] Workqueue: events_unbound commit_work [drm_kms_helper]
[247605.032065] Call Trace:
[247605.032069] ? __schedule+0x2ba/0xaf0
[247605.032073] ? dma_fence_default_wait+0x231/0x370
[247605.032075] schedule+0x2f/0x90
[247605.032078] schedule_timeout+0x35c/0x520
[247605.032081] ? dma_fence_default_wait+0x72/0x370
[247605.032085] ? dma_fence_default_wait+0x231/0x370
[247605.032087] dma_fence_default_wait+0x25d/0x370
[247605.032089] ? dma_fence_release+0x160/0x160
[247605.032092] dma_fence_wait_timeout+0x4f/0x270
[247605.032095] reservation_object_wait_timeout_rcu+0x236/0x4e0
[247605.032127] amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[247605.032162] amdgpu_dm_atomic_commit_tail+0xa76/0xd00 [amdgpu]
[247605.032166] ? wait_for_completion_timeout+0x73/0x1a0
[247605.032175] commit_tail+0x3d/0x70 [drm_kms_helper]
[247605.032180] process_one_work+0x261/0x630
[247605.032185] worker_thread+0x3a/0x390
[247605.032188] ? process_one_work+0x630/0x630
[247605.032191] kthread+0x120/0x140
[247605.032194] ? kthread_create_worker_on_cpu+0x70/0x70
[247605.032197] ret_from_fork+0x3a/0x50
[247605.032208] INFO: lockdep is turned off.
[247640.263559] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247640.663689] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247641.416206] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247641.512251] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247641.773087] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.121791] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.220684] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.481411] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.612305] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.900084] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.935635] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247642.999194] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247643.552447] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247643.668968] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247643.690139] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.099977] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.232435] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.292521] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.358833] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.376341] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.390073] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.514553] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.529169] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.581504] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.688219] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.787111] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.812531] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.873729] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.928613] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.939548] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247644.961052] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.056869] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.198003] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.280336] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.360668] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.434358] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.441931] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.565895] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.639253] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.711531] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.729971] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.744137] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247645.952694] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.140934] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.259925] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.319308] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.363976] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.389526] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.457577] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.513275] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.544150] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.637789] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.651337] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.710404] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.785978] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.928178] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247646.955859] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.016425] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.134880] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.159276] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.249781] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.315185] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.325523] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.361488] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.383235] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.439095] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.460806] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.485170] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.502436] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.548979] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.594343] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.621786] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.649303] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.670292] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.701090] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.735796] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.774236] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.816521] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.840603] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.869076] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.948394] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247647.977194] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.008216] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.041878] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.102950] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.123688] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.161477] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.210530] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.248898] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.273809] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.308455] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.357214] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.393870] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.418454] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.429277] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.508805] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.529862] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.581775] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.595466] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.679402] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.714558] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.767368] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.784370] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.805855] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.872980] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.933891] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.944161] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247648.979727] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.036203] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.094332] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.138191] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.175616] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.279457] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.313344] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.483680] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.519062] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.554865] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.601461] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.655004] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.760903] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.784816] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.870742] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247649.923269] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.003330] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.129582] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.206246] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.330698] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.481865] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.513212] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.564055] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.773681] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.780123] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.821904] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.841934] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.877117] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.901374] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247650.985498] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.026897] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.068131] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.109751] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.126539] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.355831] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.791237] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.829065] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247651.928932] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.077168] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.083449] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.211548] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.288786] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.302159] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.496320] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.614161] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.655070] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.745940] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247652.808084] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.117247] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.141879] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.166410] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.193642] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.338192] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.560506] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247653.898569] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247654.135093] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247654.283233] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247654.445210] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247654.465085] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247654.865339] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247654.987101] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247655.933191] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247655.993198] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247656.465146] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
[247669.543630] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring sdma0 timeout, last signaled seq=6004078, last emitted seq=6004080
[247669.543635] [drm] No hardware hang detected. Did some blocks stall?
A very strange coincidence:
Every time I reproduce the described bug case with GPU hangup while playing a video with VAAPI acceleration.
The following messages will appear in the kernel log after reboot:
Just started running into this after upgrading to kernel 5.4.0. RX480, launching VLC with VA-API allowed results in immediate GPU crash (fan spinning up to max, all LEDs out, no display, no audio).
This issue hasn't had any activity since 2019-12-27. The AMD driver stack changes rapidly and contains lots of shared code across products so it's possible that it has already been fixed. Please upgrade to a current stable kernel and userspace stack and try again. If you still experience this issue with the latest driver stack, please capture relevant logging and open a new issue referring back to this one.