Since 4.20 kernel Vega 56 hangs when I surf pages in steam client
@Mikhail
Submitted by Mikhail Gavrilov Assigned to Default DRI bug account
Link to original bug (#108710)
Description
Created attachment 142434
dmesg
$ inxi -bM
System: Host: localhost.localdomain Kernel: 4.20.0-0.rc1.git4.1.fc30.x86_64 x86_64 bits: 64 Desktop: Gnome 3.30.1
Distro: Fedora release 30 (Rawhide)
Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X470-I GAMING v: Rev 1.xx serial: <root required>
UEFI: American Megatrends v: 0901 date: 07/23/2018
CPU: 8-Core: AMD Ryzen 7 2700X type: MT MCP speed: 3427 MHz min/max: 2200/4000 MHz
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] driver: amdgpu v: kernel
Display: wayland server: Fedora Project X.org 1.20.3 driver: amdgpu resolution: 3840x2160~60Hz
OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.27.0 4.20.0-0.rc1.git4.1.fc30.x86_64 LLVM 7.0.0) v: 4.5 Mesa 18.2.4
Network: Device-1: Intel I211 Gigabit Network driver: igb
Device-2: Realtek RTL8822BE 802.11a/b/g/n/ac WiFi adapter driver: r8822be
Drives: Local Storage: total: 11.36 TiB used: 5.93 TiB (52.2%)
Info: Processes: 455 Uptime: 16m Memory: 31.30 GiB used: 15.99 GiB (51.1%) Shell: bash inxi: 3.0.27
[ 3852.511166] gmc_v9_0_process_interrupt: 56 callbacks suppressed
[ 3852.511182] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:169 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 3852.511184] amdgpu 0000:0b:00.0: in page starting at address 0x000000401080c000 from 18
[ 3852.511186] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00040152
[ 3862.673344] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring sdma1 timeout, signaled seq=72072, emitted seq=72074
[ 3862.673356] [drm] GPU recovery disabled.
[ 4044.170764] sysrq: SysRq : Show Blocked State
[ 4044.170959] task PC stack pid father
[ 4044.171026] kworker/u32:5 D10872 253 2 0x80000000
[ 4044.171060] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 4044.171063] Call Trace:
[ 4044.171073] ? __schedule+0x2f3/0xb90
[ 4044.171077] ? __lock_acquire+0x279/0x1650
[ 4044.171085] ? dma_fence_default_wait+0x242/0x330
[ 4044.171089] schedule+0x2f/0x90
[ 4044.171092] schedule_timeout+0x31c/0x4f0
[ 4044.171096] ? find_held_lock+0x34/0xa0
[ 4044.171099] ? find_held_lock+0x34/0xa0
[ 4044.171104] ? mark_held_locks+0x57/0x80
[ 4044.171134] ? _raw_spin_unlock_irqrestore+0x4b/0x60
[ 4044.171140] ? dma_fence_default_wait+0x242/0x330
[ 4044.171143] dma_fence_default_wait+0x26e/0x330
[ 4044.171147] ? dma_fence_release+0x120/0x120
[ 4044.171153] dma_fence_wait_timeout+0x182/0x200
[ 4044.171160] reservation_object_wait_timeout_rcu+0x236/0x4e0
[ 4044.171263] amdgpu_dm_do_flip+0x112/0x380 [amdgpu]
[ 4044.171378] amdgpu_dm_atomic_commit_tail+0x6d0/0xd30 [amdgpu]
[ 4044.171386] ? _raw_spin_unlock_irq+0x29/0x40
[ 4044.171391] ? wait_for_completion_timeout+0x73/0x1a0
[ 4044.171408] commit_tail+0x3d/0x70 [drm_kms_helper]
[ 4044.171413] process_one_work+0x27d/0x600
[ 4044.171423] worker_thread+0x3c/0x390
[ 4044.171428] ? drain_workqueue+0x180/0x180
[ 4044.171433] kthread+0x120/0x140
[ 4044.171437] ? kthread_park+0x80/0x80
[ 4044.171442] ret_from_fork+0x27/0x50
[ 4044.172479] (time-dir) D13944 15221 1 0x00000000
[ 4044.172487] Call Trace:
[ 4044.172496] ? __schedule+0x2f3/0xb90
[ 4044.172501] ? prepare_to_wait_event+0xd2/0x180
[ 4044.172508] schedule+0x2f/0x90
[ 4044.172514] drm_sched_entity_flush+0x1df/0x1f0 [gpu_sched]
[ 4044.172518] ? finish_wait+0x80/0x80
[ 4044.172580] amdgpu_ctx_mgr_entity_flush+0x7c/0xc0 [amdgpu]
[ 4044.172637] amdgpu_flush+0x1f/0x30 [amdgpu]
[ 4044.172640] filp_close+0x34/0x70
[ 4044.172645] __x64_sys_close+0x1e/0x50
[ 4044.172649] do_syscall_64+0x60/0x1f0
[ 4044.172653] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 4044.172656] RIP: 0033:0x7f5a96622ec7
[ 4044.172662] Code: Bad RIP value.
[ 4044.172665] RSP: 002b:00007ffcce3d00e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 4044.172668] RAX: ffffffffffffffda RBX: 000000000000007c RCX: 00007f5a96622ec7
[ 4044.172671] RDX: 0000000000000000 RSI: 00007ffcce3d0180 RDI: 000000000000007c
[ 4044.172673] RBP: 000055d29a73aa60 R08: 000055d29a73b676 R09: 0000000000000000
[ 4044.172675] R10: 00007f5a965bbae0 R11: 0000000000000293 R12: 00007f5a95939750
[ 4044.172677] R13: 0000000000000000 R14: 0000000000000001 R15: 00007ffcce3d0180
[ 4057.229953] INFO: task kworker/u32:5:253 blocked for more than 120 seconds.
[ 4057.229957] Tainted: G WC 4.20.0-0.rc1.git4.1.fc30.x86_64 #1 (closed)
[ 4057.229959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4057.229962] kworker/u32:5 D10872 253 2 0x80000000
[ 4057.229979] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 4057.229982] Call Trace:
[ 4057.229994] ? __schedule+0x2f3/0xb90
[ 4057.229998] ? __lock_acquire+0x279/0x1650
[ 4057.230006] ? dma_fence_default_wait+0x242/0x330
[ 4057.230010] schedule+0x2f/0x90
[ 4057.230013] schedule_timeout+0x31c/0x4f0
[ 4057.230017] ? find_held_lock+0x34/0xa0
[ 4057.230020] ? find_held_lock+0x34/0xa0
[ 4057.230025] ? mark_held_locks+0x57/0x80
[ 4057.230028] ? _raw_spin_unlock_irqrestore+0x4b/0x60
[ 4057.230034] ? dma_fence_default_wait+0x242/0x330
[ 4057.230037] dma_fence_default_wait+0x26e/0x330
[ 4057.230041] ? dma_fence_release+0x120/0x120
[ 4057.230047] dma_fence_wait_timeout+0x182/0x200
[ 4057.230052] reservation_object_wait_timeout_rcu+0x236/0x4e0
[ 4057.230134] amdgpu_dm_do_flip+0x112/0x380 [amdgpu]
[ 4057.230221] amdgpu_dm_atomic_commit_tail+0x6d0/0xd30 [amdgpu]
[ 4057.230228] ? _raw_spin_unlock_irq+0x29/0x40
[ 4057.230232] ? wait_for_completion_timeout+0x73/0x1a0
[ 4057.230249] commit_tail+0x3d/0x70 [drm_kms_helper]
[ 4057.230254] process_one_work+0x27d/0x600
[ 4057.230263] worker_thread+0x3c/0x390
[ 4057.230269] ? drain_workqueue+0x180/0x180
[ 4057.230272] kthread+0x120/0x140
[ 4057.230276] ? kthread_park+0x80/0x80
[ 4057.230281] ret_from_fork+0x27/0x50
[ 4057.230571]
Showing all locks held in the system:
[ 4057.230581] 1 lock held by khungtaskd/94:
[ 4057.230583] #0: 00000000a1fc4e6f (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x183
[ 4057.230596] 3 locks held by kworker/u32:5/253:
[ 4057.230597] #0: 00000000156505f1 ((wq_completion)"events_unbound"){+.+.}, at: process_one_work+0x1f3/0x600
[ 4057.230603] #1 (closed): 000000000d248f14 ((work_completion)(&state->commit_work)){+.+.}, at: process_one_work+0x1f3/0x600
[ 4057.230608] #2: 000000003df03870 (reservation_ww_class_mutex){+.+.}, at: amdgpu_dm_do_flip+0xd6/0x380 [amdgpu]
[ 4057.230700] 2 locks held by gnome-shell/2152:
[ 4057.230702] #0: 00000000a2cb2cbf (crtc_ww_class_acquire){+.+.}, at: drm_mode_cursor_common+0x95/0x220 [drm]
[ 4057.230721] #1 (closed): 00000000e86bda0d (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x101/0x120 [drm]
[ 4057.230746] 5 locks held by Xwayland/2222:
[ 4057.230784] 1 lock held by htop/3225:
[ 4057.230848] 1 lock held by CPU 0/KVM/4333:
[ 4057.230989] 1 lock held by (time-dir)/15221:
[ 4057.230991] #0: 000000006ef8a6af (&mgr->lock){+.+.}, at: amdgpu_ctx_mgr_entity_flush+0x3c/0xc0 [amdgpu]
[ 4057.231068] =============================================
Attachment 142434, "dmesg":
dmesg.txt