[navi] Firefox causes a random GPU hang
With regular browsing, Firefox sometimes causes a GPU hang, which requires a reboot.
- GPU: Sapphire Pulse RX 5700 XT
- OS: Debian testing Linux
- Kernel: 5.4-rc2
- Navi firmware: latest from 20190923.
- Mesa: 19.2.0 + llvm9 (though the same thing happens with current Mesa master + llvm10 according to my tests).
- Firefox: 70.0b12 (Mozilla build), WebRender enabled.
When the hang happens, the system can still be accessed over ssh, and what's what I see in dmesg:
[ 4776.128190] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 4781.248178] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=261464, emitted seq=261467
[ 4781.248245] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GPU Process pid 1877 thread firefox-bi:cs0 pid 2010
[ 4781.248247] [drm] GPU recovery disabled.
[ 4955.072716] INFO: task Xorg:1011 blocked for more than 120 seconds.
[ 4955.072720] Tainted: G E 5.4.0-rc2 #21
[ 4955.072722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4955.072724] Xorg D 0 1011 999 0x00400004
[ 4955.072726] Call Trace:
[ 4955.072734] ? __schedule+0x29f/0x740
[ 4955.072736] schedule+0x39/0xa0
[ 4955.072738] schedule_timeout+0x20f/0x300
[ 4955.072741] dma_fence_default_wait+0x1bc/0x2a0
[ 4955.072743] ? dma_fence_release+0x140/0x140
[ 4955.072745] dma_fence_wait_timeout+0xdd/0x100
[ 4955.072809] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 4955.072860] amdgpu_gart_bind+0x6e/0xa0 [amdgpu]
[ 4955.072909] amdgpu_ttm_gart_bind+0x73/0xc0 [amdgpu]
[ 4955.072958] amdgpu_ttm_alloc_gart+0x23b/0x330 [amdgpu]
[ 4955.072961] ? refcount_inc_checked+0x5/0x30
[ 4955.073012] amdgpu_vm_clear_bo+0x165/0x3b0 [amdgpu]
[ 4955.073015] ? _cond_resched+0x15/0x30
[ 4955.073017] ? mutex_lock+0xe/0x30
[ 4955.073064] ? amdgpu_bo_create+0x1a5/0x220 [amdgpu]
[ 4955.073114] amdgpu_vm_update_ptes+0x284/0x6b0 [amdgpu]
[ 4955.073164] amdgpu_vm_bo_update_mapping+0xb3/0xe0 [amdgpu]
[ 4955.073213] amdgpu_vm_bo_update+0x326/0x760 [amdgpu]
[ 4955.073263] amdgpu_gem_va_ioctl+0x522/0x550 [amdgpu]
[ 4955.073275] ? drm_gem_object_put_unlocked+0x3b/0x60 [drm]
[ 4955.073324] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 4955.073334] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 4955.073344] drm_ioctl+0x208/0x390 [drm]
[ 4955.073393] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 4955.073440] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 4955.073444] do_vfs_ioctl+0x40e/0x670
[ 4955.073446] ksys_ioctl+0x5e/0x90
[ 4955.073448] __x64_sys_ioctl+0x16/0x20
[ 4955.073451] do_syscall_64+0x52/0x160
[ 4955.073453] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4955.073455] RIP: 0033:0x7fd256ae55d7
[ 4955.073459] Code: Bad RIP value.
[ 4955.073460] RSP: 002b:00007ffcf27b0538 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 4955.073462] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd256ae55d7
[ 4955.073463] RDX: 00007ffcf27b0580 RSI: 00000000c0286448 RDI: 000000000000000e
[ 4955.073463] RBP: 00007ffcf27b0580 R08: ffff800103600000 R09: 000000000000000e
[ 4955.073464] R10: 0000000000000037 R11: 0000000000003246 R12: 00000000c0286448
[ 4955.073465] R13: 000000000000000e R14: 00000000001b5000 R15: 0000564bb1883750
[ 4955.073598] INFO: task kworker/11:2:11488 blocked for more than 120 seconds.
[ 4955.073599] Tainted: G E 5.4.0-rc2 #21
[ 4955.073600] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4955.073601] kworker/11:2 D 0 11488 2 0x80004000
[ 4955.073607] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[ 4955.073608] Call Trace:
[ 4955.073611] ? __schedule+0x29f/0x740
[ 4955.073612] schedule+0x39/0xa0
[ 4955.073614] schedule_timeout+0x20f/0x300
[ 4955.073616] ? __wake_up_common+0x80/0x180
[ 4955.073618] dma_fence_default_wait+0x1bc/0x2a0
[ 4955.073620] ? dma_fence_release+0x140/0x140
[ 4955.073622] dma_fence_wait_timeout+0xdd/0x100
[ 4955.073677] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 4955.073726] amdgpu_gart_unbind+0x9e/0xd0 [amdgpu]
[ 4955.073774] amdgpu_ttm_backend_unbind+0x3c/0xe0 [amdgpu]
[ 4955.073778] ttm_tt_unbind+0x1d/0x30 [ttm]
[ 4955.073782] ttm_tt_destroy.part.0+0xe/0x50 [ttm]
[ 4955.073785] ttm_bo_cleanup_memtype_use+0x32/0x80 [ttm]
[ 4955.073789] ttm_bo_cleanup_refs+0x129/0x1e0 [ttm]
[ 4955.073792] ttm_bo_delayed_delete+0xab/0x200 [ttm]
[ 4955.073796] ttm_bo_delayed_workqueue+0x18/0x40 [ttm]
[ 4955.073798] process_one_work+0x1b5/0x360
[ 4955.073800] worker_thread+0x50/0x3c0
[ 4955.073802] kthread+0xf9/0x130
[ 4955.073804] ? process_one_work+0x360/0x360
[ 4955.073805] ? kthread_park+0x90/0x90
[ 4955.073807] ret_from_fork+0x22/0x40
[ 5075.904811] INFO: task Xorg:1011 blocked for more than 241 seconds.
[ 5075.904815] Tainted: G E 5.4.0-rc2 #21
[ 5075.904816] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5075.904818] Xorg D 0 1011 999 0x00400004
[ 5075.904821] Call Trace:
[ 5075.904828] ? __schedule+0x29f/0x740
[ 5075.904830] schedule+0x39/0xa0
[ 5075.904832] schedule_timeout+0x20f/0x300
[ 5075.904836] dma_fence_default_wait+0x1bc/0x2a0
[ 5075.904838] ? dma_fence_release+0x140/0x140
[ 5075.904840] dma_fence_wait_timeout+0xdd/0x100
[ 5075.904903] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5075.904955] amdgpu_gart_bind+0x6e/0xa0 [amdgpu]
[ 5075.905004] amdgpu_ttm_gart_bind+0x73/0xc0 [amdgpu]
[ 5075.905052] amdgpu_ttm_alloc_gart+0x23b/0x330 [amdgpu]
[ 5075.905056] ? refcount_inc_checked+0x5/0x30
[ 5075.905107] amdgpu_vm_clear_bo+0x165/0x3b0 [amdgpu]
[ 5075.905109] ? _cond_resched+0x15/0x30
[ 5075.905111] ? mutex_lock+0xe/0x30
[ 5075.905158] ? amdgpu_bo_create+0x1a5/0x220 [amdgpu]
[ 5075.905208] amdgpu_vm_update_ptes+0x284/0x6b0 [amdgpu]
[ 5075.905257] amdgpu_vm_bo_update_mapping+0xb3/0xe0 [amdgpu]
[ 5075.905306] amdgpu_vm_bo_update+0x326/0x760 [amdgpu]
[ 5075.905357] amdgpu_gem_va_ioctl+0x522/0x550 [amdgpu]
[ 5075.905368] ? drm_gem_object_put_unlocked+0x3b/0x60 [drm]
[ 5075.905417] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5075.905427] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 5075.905437] drm_ioctl+0x208/0x390 [drm]
[ 5075.905486] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5075.905533] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 5075.905537] do_vfs_ioctl+0x40e/0x670
[ 5075.905540] ksys_ioctl+0x5e/0x90
[ 5075.905541] __x64_sys_ioctl+0x16/0x20
[ 5075.905544] do_syscall_64+0x52/0x160
[ 5075.905546] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5075.905548] RIP: 0033:0x7fd256ae55d7
[ 5075.905553] Code: Bad RIP value.
[ 5075.905554] RSP: 002b:00007ffcf27b0538 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 5075.905556] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd256ae55d7
[ 5075.905556] RDX: 00007ffcf27b0580 RSI: 00000000c0286448 RDI: 000000000000000e
[ 5075.905557] RBP: 00007ffcf27b0580 R08: ffff800103600000 R09: 000000000000000e
[ 5075.905558] R10: 0000000000000037 R11: 0000000000003246 R12: 00000000c0286448
[ 5075.905558] R13: 000000000000000e R14: 00000000001b5000 R15: 0000564bb1883750
[ 5075.905693] INFO: task kworker/11:2:11488 blocked for more than 241 seconds.
[ 5075.905694] Tainted: G E 5.4.0-rc2 #21
[ 5075.905695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5075.905696] kworker/11:2 D 0 11488 2 0x80004000
[ 5075.905702] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[ 5075.905703] Call Trace:
[ 5075.905706] ? __schedule+0x29f/0x740
[ 5075.905708] schedule+0x39/0xa0
[ 5075.905709] schedule_timeout+0x20f/0x300
[ 5075.905712] ? __wake_up_common+0x80/0x180
[ 5075.905714] dma_fence_default_wait+0x1bc/0x2a0
[ 5075.905715] ? dma_fence_release+0x140/0x140
[ 5075.905717] dma_fence_wait_timeout+0xdd/0x100
[ 5075.905773] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5075.905822] amdgpu_gart_unbind+0x9e/0xd0 [amdgpu]
[ 5075.905870] amdgpu_ttm_backend_unbind+0x3c/0xe0 [amdgpu]
[ 5075.905874] ttm_tt_unbind+0x1d/0x30 [ttm]
[ 5075.905877] ttm_tt_destroy.part.0+0xe/0x50 [ttm]
[ 5075.905881] ttm_bo_cleanup_memtype_use+0x32/0x80 [ttm]
[ 5075.905884] ttm_bo_cleanup_refs+0x129/0x1e0 [ttm]
[ 5075.905887] ttm_bo_delayed_delete+0xab/0x200 [ttm]
[ 5075.905891] ttm_bo_delayed_workqueue+0x18/0x40 [ttm]
[ 5075.905894] process_one_work+0x1b5/0x360
[ 5075.905896] worker_thread+0x50/0x3c0
[ 5075.905898] kthread+0xf9/0x130
[ 5075.905900] ? process_one_work+0x360/0x360
[ 5075.905901] ? kthread_park+0x90/0x90
[ 5075.905902] ret_from_fork+0x22/0x40
[ 5196.736904] INFO: task Xorg:1011 blocked for more than 362 seconds.
[ 5196.736908] Tainted: G E 5.4.0-rc2 #21
[ 5196.736909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5196.736911] Xorg D 0 1011 999 0x00400004
[ 5196.736914] Call Trace:
[ 5196.736921] ? __schedule+0x29f/0x740
[ 5196.736923] schedule+0x39/0xa0
[ 5196.736925] schedule_timeout+0x20f/0x300
[ 5196.736929] dma_fence_default_wait+0x1bc/0x2a0
[ 5196.736931] ? dma_fence_release+0x140/0x140
[ 5196.736933] dma_fence_wait_timeout+0xdd/0x100
[ 5196.736996] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5196.737047] amdgpu_gart_bind+0x6e/0xa0 [amdgpu]
[ 5196.737097] amdgpu_ttm_gart_bind+0x73/0xc0 [amdgpu]
[ 5196.737145] amdgpu_ttm_alloc_gart+0x23b/0x330 [amdgpu]
[ 5196.737148] ? refcount_inc_checked+0x5/0x30
[ 5196.737199] amdgpu_vm_clear_bo+0x165/0x3b0 [amdgpu]
[ 5196.737202] ? _cond_resched+0x15/0x30
[ 5196.737204] ? mutex_lock+0xe/0x30
[ 5196.737251] ? amdgpu_bo_create+0x1a5/0x220 [amdgpu]
[ 5196.737301] amdgpu_vm_update_ptes+0x284/0x6b0 [amdgpu]
[ 5196.737351] amdgpu_vm_bo_update_mapping+0xb3/0xe0 [amdgpu]
[ 5196.737400] amdgpu_vm_bo_update+0x326/0x760 [amdgpu]
[ 5196.737450] amdgpu_gem_va_ioctl+0x522/0x550 [amdgpu]
[ 5196.737462] ? drm_gem_object_put_unlocked+0x3b/0x60 [drm]
[ 5196.737511] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5196.737521] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 5196.737531] drm_ioctl+0x208/0x390 [drm]
[ 5196.737580] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5196.737626] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 5196.737630] do_vfs_ioctl+0x40e/0x670
[ 5196.737633] ksys_ioctl+0x5e/0x90
[ 5196.737634] __x64_sys_ioctl+0x16/0x20
[ 5196.737637] do_syscall_64+0x52/0x160
[ 5196.737639] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5196.737641] RIP: 0033:0x7fd256ae55d7
[ 5196.737646] Code: Bad RIP value.
[ 5196.737647] RSP: 002b:00007ffcf27b0538 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 5196.737649] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd256ae55d7
[ 5196.737649] RDX: 00007ffcf27b0580 RSI: 00000000c0286448 RDI: 000000000000000e
[ 5196.737650] RBP: 00007ffcf27b0580 R08: ffff800103600000 R09: 000000000000000e
[ 5196.737651] R10: 0000000000000037 R11: 0000000000003246 R12: 00000000c0286448
[ 5196.737651] R13: 000000000000000e R14: 00000000001b5000 R15: 0000564bb1883750
[ 5196.737785] INFO: task kworker/11:2:11488 blocked for more than 362 seconds.
[ 5196.737787] Tainted: G E 5.4.0-rc2 #21
[ 5196.737788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5196.737789] kworker/11:2 D 0 11488 2 0x80004000
[ 5196.737795] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[ 5196.737796] Call Trace:
[ 5196.737798] ? __schedule+0x29f/0x740
[ 5196.737800] schedule+0x39/0xa0
[ 5196.737802] schedule_timeout+0x20f/0x300
[ 5196.737804] ? __wake_up_common+0x80/0x180
[ 5196.737806] dma_fence_default_wait+0x1bc/0x2a0
[ 5196.737808] ? dma_fence_release+0x140/0x140
[ 5196.737809] dma_fence_wait_timeout+0xdd/0x100
[ 5196.737865] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5196.737914] amdgpu_gart_unbind+0x9e/0xd0 [amdgpu]
[ 5196.737962] amdgpu_ttm_backend_unbind+0x3c/0xe0 [amdgpu]
[ 5196.737966] ttm_tt_unbind+0x1d/0x30 [ttm]
[ 5196.737970] ttm_tt_destroy.part.0+0xe/0x50 [ttm]
[ 5196.737974] ttm_bo_cleanup_memtype_use+0x32/0x80 [ttm]
[ 5196.737977] ttm_bo_cleanup_refs+0x129/0x1e0 [ttm]
[ 5196.737981] ttm_bo_delayed_delete+0xab/0x200 [ttm]
[ 5196.737984] ttm_bo_delayed_workqueue+0x18/0x40 [ttm]
[ 5196.737987] process_one_work+0x1b5/0x360
[ 5196.737989] worker_thread+0x50/0x3c0
[ 5196.737991] kthread+0xf9/0x130
[ 5196.737993] ? process_one_work+0x360/0x360
[ 5196.737994] ? kthread_park+0x90/0x90
[ 5196.737996] ret_from_fork+0x22/0x40
[ 5317.569003] INFO: task Xorg:1011 blocked for more than 483 seconds.
[ 5317.569007] Tainted: G E 5.4.0-rc2 #21
[ 5317.569008] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5317.569010] Xorg D 0 1011 999 0x00400004
[ 5317.569012] Call Trace:
[ 5317.569020] ? __schedule+0x29f/0x740
[ 5317.569022] schedule+0x39/0xa0
[ 5317.569024] schedule_timeout+0x20f/0x300
[ 5317.569028] dma_fence_default_wait+0x1bc/0x2a0
[ 5317.569030] ? dma_fence_release+0x140/0x140
[ 5317.569032] dma_fence_wait_timeout+0xdd/0x100
[ 5317.569095] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5317.569146] amdgpu_gart_bind+0x6e/0xa0 [amdgpu]
[ 5317.569196] amdgpu_ttm_gart_bind+0x73/0xc0 [amdgpu]
[ 5317.569244] amdgpu_ttm_alloc_gart+0x23b/0x330 [amdgpu]
[ 5317.569247] ? refcount_inc_checked+0x5/0x30
[ 5317.569298] amdgpu_vm_clear_bo+0x165/0x3b0 [amdgpu]
[ 5317.569301] ? _cond_resched+0x15/0x30
[ 5317.569302] ? mutex_lock+0xe/0x30
[ 5317.569350] ? amdgpu_bo_create+0x1a5/0x220 [amdgpu]
[ 5317.569399] amdgpu_vm_update_ptes+0x284/0x6b0 [amdgpu]
[ 5317.569449] amdgpu_vm_bo_update_mapping+0xb3/0xe0 [amdgpu]
[ 5317.569498] amdgpu_vm_bo_update+0x326/0x760 [amdgpu]
[ 5317.569548] amdgpu_gem_va_ioctl+0x522/0x550 [amdgpu]
[ 5317.569560] ? drm_gem_object_put_unlocked+0x3b/0x60 [drm]
[ 5317.569609] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5317.569619] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 5317.569630] drm_ioctl+0x208/0x390 [drm]
[ 5317.569678] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5317.569725] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 5317.569730] do_vfs_ioctl+0x40e/0x670
[ 5317.569732] ksys_ioctl+0x5e/0x90
[ 5317.569734] __x64_sys_ioctl+0x16/0x20
[ 5317.569737] do_syscall_64+0x52/0x160
[ 5317.569739] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5317.569741] RIP: 0033:0x7fd256ae55d7
[ 5317.569746] Code: Bad RIP value.
[ 5317.569747] RSP: 002b:00007ffcf27b0538 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 5317.569749] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd256ae55d7
[ 5317.569749] RDX: 00007ffcf27b0580 RSI: 00000000c0286448 RDI: 000000000000000e
[ 5317.569750] RBP: 00007ffcf27b0580 R08: ffff800103600000 R09: 000000000000000e
[ 5317.569751] R10: 0000000000000037 R11: 0000000000003246 R12: 00000000c0286448
[ 5317.569751] R13: 000000000000000e R14: 00000000001b5000 R15: 0000564bb1883750
[ 5317.569883] INFO: task kworker/11:2:11488 blocked for more than 483 seconds.
[ 5317.569885] Tainted: G E 5.4.0-rc2 #21
[ 5317.569886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5317.569887] kworker/11:2 D 0 11488 2 0x80004000
[ 5317.569893] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[ 5317.569894] Call Trace:
[ 5317.569897] ? __schedule+0x29f/0x740
[ 5317.569898] schedule+0x39/0xa0
[ 5317.569900] schedule_timeout+0x20f/0x300
[ 5317.569902] ? __wake_up_common+0x80/0x180
[ 5317.569904] dma_fence_default_wait+0x1bc/0x2a0
[ 5317.569906] ? dma_fence_release+0x140/0x140
[ 5317.569908] dma_fence_wait_timeout+0xdd/0x100
[ 5317.569964] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5317.570013] amdgpu_gart_unbind+0x9e/0xd0 [amdgpu]
[ 5317.570061] amdgpu_ttm_backend_unbind+0x3c/0xe0 [amdgpu]
[ 5317.570065] ttm_tt_unbind+0x1d/0x30 [ttm]
[ 5317.570069] ttm_tt_destroy.part.0+0xe/0x50 [ttm]
[ 5317.570072] ttm_bo_cleanup_memtype_use+0x32/0x80 [ttm]
[ 5317.570076] ttm_bo_cleanup_refs+0x129/0x1e0 [ttm]
[ 5317.570079] ttm_bo_delayed_delete+0xab/0x200 [ttm]
[ 5317.570082] ttm_bo_delayed_workqueue+0x18/0x40 [ttm]
[ 5317.570085] process_one_work+0x1b5/0x360
[ 5317.570087] worker_thread+0x50/0x3c0
[ 5317.570089] kthread+0xf9/0x130
[ 5317.570091] ? process_one_work+0x360/0x360
[ 5317.570092] ? kthread_park+0x90/0x90
[ 5317.570094] ret_from_fork+0x22/0x40
[ 5438.401079] INFO: task Xorg:1011 blocked for more than 604 seconds.
[ 5438.401083] Tainted: G E 5.4.0-rc2 #21
[ 5438.401084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5438.401086] Xorg D 0 1011 999 0x00400004
[ 5438.401088] Call Trace:
[ 5438.401095] ? __schedule+0x29f/0x740
[ 5438.401098] schedule+0x39/0xa0
[ 5438.401100] schedule_timeout+0x20f/0x300
[ 5438.401103] dma_fence_default_wait+0x1bc/0x2a0
[ 5438.401105] ? dma_fence_release+0x140/0x140
[ 5438.401107] dma_fence_wait_timeout+0xdd/0x100
[ 5438.401170] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5438.401221] amdgpu_gart_bind+0x6e/0xa0 [amdgpu]
[ 5438.401270] amdgpu_ttm_gart_bind+0x73/0xc0 [amdgpu]
[ 5438.401318] amdgpu_ttm_alloc_gart+0x23b/0x330 [amdgpu]
[ 5438.401322] ? refcount_inc_checked+0x5/0x30
[ 5438.401373] amdgpu_vm_clear_bo+0x165/0x3b0 [amdgpu]
[ 5438.401375] ? _cond_resched+0x15/0x30
[ 5438.401377] ? mutex_lock+0xe/0x30
[ 5438.401424] ? amdgpu_bo_create+0x1a5/0x220 [amdgpu]
[ 5438.401474] amdgpu_vm_update_ptes+0x284/0x6b0 [amdgpu]
[ 5438.401524] amdgpu_vm_bo_update_mapping+0xb3/0xe0 [amdgpu]
[ 5438.401573] amdgpu_vm_bo_update+0x326/0x760 [amdgpu]
[ 5438.401623] amdgpu_gem_va_ioctl+0x522/0x550 [amdgpu]
[ 5438.401635] ? drm_gem_object_put_unlocked+0x3b/0x60 [drm]
[ 5438.401684] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5438.401694] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 5438.401704] drm_ioctl+0x208/0x390 [drm]
[ 5438.401753] ? amdgpu_gem_metadata_ioctl+0x190/0x190 [amdgpu]
[ 5438.401800] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 5438.401804] do_vfs_ioctl+0x40e/0x670
[ 5438.401806] ksys_ioctl+0x5e/0x90
[ 5438.401808] __x64_sys_ioctl+0x16/0x20
[ 5438.401810] do_syscall_64+0x52/0x160
[ 5438.401812] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5438.401814] RIP: 0033:0x7fd256ae55d7
[ 5438.401819] Code: Bad RIP value.
[ 5438.401820] RSP: 002b:00007ffcf27b0538 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 5438.401822] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd256ae55d7
[ 5438.401822] RDX: 00007ffcf27b0580 RSI: 00000000c0286448 RDI: 000000000000000e
[ 5438.401823] RBP: 00007ffcf27b0580 R08: ffff800103600000 R09: 000000000000000e
[ 5438.401824] R10: 0000000000000037 R11: 0000000000003246 R12: 00000000c0286448
[ 5438.401824] R13: 000000000000000e R14: 00000000001b5000 R15: 0000564bb1883750
[ 5438.401957] INFO: task kworker/11:2:11488 blocked for more than 604 seconds.
[ 5438.401958] Tainted: G E 5.4.0-rc2 #21
[ 5438.401959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5438.401960] kworker/11:2 D 0 11488 2 0x80004000
[ 5438.401966] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[ 5438.401967] Call Trace:
[ 5438.401970] ? __schedule+0x29f/0x740
[ 5438.401972] schedule+0x39/0xa0
[ 5438.401973] schedule_timeout+0x20f/0x300
[ 5438.401976] ? __wake_up_common+0x80/0x180
[ 5438.401978] dma_fence_default_wait+0x1bc/0x2a0
[ 5438.401979] ? dma_fence_release+0x140/0x140
[ 5438.401981] dma_fence_wait_timeout+0xdd/0x100
[ 5438.402037] gmc_v10_0_flush_gpu_tlb+0x15e/0x1b0 [amdgpu]
[ 5438.402086] amdgpu_gart_unbind+0x9e/0xd0 [amdgpu]
[ 5438.402134] amdgpu_ttm_backend_unbind+0x3c/0xe0 [amdgpu]
[ 5438.402138] ttm_tt_unbind+0x1d/0x30 [ttm]
[ 5438.402142] ttm_tt_destroy.part.0+0xe/0x50 [ttm]
[ 5438.402145] ttm_bo_cleanup_memtype_use+0x32/0x80 [ttm]
[ 5438.402148] ttm_bo_cleanup_refs+0x129/0x1e0 [ttm]
[ 5438.402152] ttm_bo_delayed_delete+0xab/0x200 [ttm]
[ 5438.402156] ttm_bo_delayed_workqueue+0x18/0x40 [ttm]
[ 5438.402158] process_one_work+0x1b5/0x360
[ 5438.402160] worker_thread+0x50/0x3c0
[ 5438.402163] kthread+0xf9/0x130
[ 5438.402164] ? process_one_work+0x360/0x360
[ 5438.402166] ? kthread_park+0x90/0x90
[ 5438.402167] ret_from_fork+0x22/0x40
Using AMD_DEBUG=nodma
mitigates this hang. Not sure if it's a bug in radeonsi somewhere, or in amdgpu. For corresponding amdgpu bug, see: https://bugs.freedesktop.org/show_bug.cgi?id=111481