possible circular locking dependency detected, bisected to "drm/nouveau: Drop mutex_lock_nested for atomic"
When booting v6.9 build on Razer Blade 2018 laptop:
[ 29.686678] ======================================================
[ 29.686680] WARNING: possible circular locking dependency detected
[ 29.686681] 6.9.0 #1 Not tainted
[ 29.686683] ------------------------------------------------------
[ 29.686684] kworker/4:1/104 is trying to acquire lock:
[ 29.686686] ffff9b5d15ce8520 (&cli->mutex){+.+.}-{3:3}, at: nouveau_bo_move+0x1d5/0xb20 [nouveau]
[ 29.686764]
but task is already holding lock:
[ 29.686766] ffff9b5d0b41c1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x213/0x390
[ 29.686772]
which lock already depends on the new lock.
[ 29.686773]
the existing dependency chain (in reverse order) is:
[ 29.686774]
-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 29.686777] __ww_mutex_lock.constprop.0+0xe4/0x10c0
[ 29.686781] ww_mutex_lock+0x3c/0xa0
[ 29.686783] nouveau_bo_pin+0x43/0x2f0 [nouveau]
[ 29.686847] nouveau_channel_ctor+0x320/0x900 [nouveau]
[ 29.686906] nouveau_channel_new+0x3f/0x500 [nouveau]
[ 29.686964] nouveau_abi16_ioctl_channel_alloc+0x16b/0x490 [nouveau]
[ 29.687021] drm_ioctl_kernel+0xb8/0x120
[ 29.687025] drm_ioctl+0x303/0x5a0
[ 29.687026] nouveau_drm_ioctl+0x64/0xc0 [nouveau]
[ 29.687088] __x64_sys_ioctl+0x9c/0xe0
[ 29.687091] x64_sys_call+0xebb/0xf80
[ 29.687095] do_syscall_64+0x89/0x140
[ 29.687098] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 29.687101]
-> #0 (&cli->mutex){+.+.}-{3:3}:
[ 29.687104] __lock_acquire+0x15bc/0x2750
[ 29.687105] lock_acquire+0xc9/0x300
[ 29.687107] __mutex_lock+0xbe/0x8a0
[ 29.687109] mutex_lock_nested+0x1b/0x30
[ 29.687112] nouveau_bo_move+0x1d5/0xb20 [nouveau]
[ 29.687172] ttm_bo_handle_move_mem+0xe2/0x1e0 [ttm]
[ 29.687180] ttm_mem_evict_first+0x43e/0x630 [ttm]
[ 29.687187] ttm_resource_manager_evict_all+0x9a/0x210 [ttm]
[ 29.687194] nouveau_do_suspend+0x6e/0x280 [nouveau]
[ 29.687255] nouveau_pmops_runtime_suspend+0x3f/0xb0 [nouveau]
[ 29.687315] pci_pm_runtime_suspend+0x67/0x1f0
[ 29.687318] __rpm_callback+0x49/0x160
[ 29.687321] rpm_callback+0x60/0x70
[ 29.687323] rpm_suspend+0xff/0x600
[ 29.687325] pm_runtime_work+0xc6/0xe0
[ 29.687328] process_one_work+0x222/0x760
[ 29.687330] worker_thread+0x193/0x370
[ 29.687332] kthread+0xf3/0x120
[ 29.687335] ret_from_fork+0x40/0x70
[ 29.687338] ret_from_fork_asm+0x11/0x20
[ 29.687341]
other info that might help us debug this:
[ 29.687342] Possible unsafe locking scenario:
[ 29.687343] CPU0 CPU1
[ 29.687344] ---- ----
[ 29.687345] lock(reservation_ww_class_mutex);
[ 29.687347] lock(&cli->mutex);
[ 29.687348] lock(reservation_ww_class_mutex);
[ 29.687350] lock(&cli->mutex);
[ 29.687352]
*** DEADLOCK ***
[ 29.687353] 3 locks held by kworker/4:1/104:
[ 29.687355] #0: ffff9b5d000bf748 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x44a/0x760
[ 29.687360] #1: ffffaf994043fe48 ((work_completion)(&dev->power.work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x760
[ 29.687364] #2: ffff9b5d0b41c1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x213/0x390
[ 29.687369]
stack backtrace:
[ 29.687370] CPU: 4 PID: 104 Comm: kworker/4:1 Not tainted 6.9.0 #1
[ 29.687373] Hardware name: Razer Blade/DANA_MB, BIOS 01.01 08/31/2018
[ 29.687374] Workqueue: pm pm_runtime_work
[ 29.687378] Call Trace:
[ 29.687379] <TASK>
[ 29.687381] dump_stack_lvl+0x8d/0xe0
[ 29.687386] dump_stack+0x10/0x20
[ 29.687388] print_circular_bug+0x275/0x340
[ 29.687392] check_noncircular+0x14c/0x170
[ 29.687396] __lock_acquire+0x15bc/0x2750
[ 29.687398] ? nvkm_ioctl_mthd+0x5d/0xc0 [nouveau]
[ 29.687443] lock_acquire+0xc9/0x300
[ 29.687445] ? nouveau_bo_move+0x1d5/0xb20 [nouveau]
[ 29.687509] __mutex_lock+0xbe/0x8a0
[ 29.687512] ? nouveau_bo_move+0x1d5/0xb20 [nouveau]
[ 29.687574] ? nvif_vmm_map+0x87/0x140 [nouveau]
[ 29.687616] ? nouveau_bo_move+0x1d5/0xb20 [nouveau]
[ 29.687679] mutex_lock_nested+0x1b/0x30
[ 29.687681] ? mutex_lock_nested+0x1b/0x30
[ 29.687684] nouveau_bo_move+0x1d5/0xb20 [nouveau]
[ 29.687745] ? lock_is_held_type+0xa5/0x120
[ 29.687750] ttm_bo_handle_move_mem+0xe2/0x1e0 [ttm]
[ 29.687758] ttm_mem_evict_first+0x43e/0x630 [ttm]
[ 29.687765] ? find_held_lock+0x31/0x90
[ 29.687768] ? ttm_resource_manager_evict_all+0x10/0x210 [ttm]
[ 29.687777] ttm_resource_manager_evict_all+0x9a/0x210 [ttm]
[ 29.687785] ? remove_id_store+0x180/0x180
[ 29.687787] nouveau_do_suspend+0x6e/0x280 [nouveau]
[ 29.687849] nouveau_pmops_runtime_suspend+0x3f/0xb0 [nouveau]
[ 29.687912] pci_pm_runtime_suspend+0x67/0x1f0
[ 29.687915] __rpm_callback+0x49/0x160
[ 29.687917] ? ktime_get_mono_fast_ns+0x43/0xa0
[ 29.687921] rpm_callback+0x60/0x70
[ 29.687924] rpm_suspend+0xff/0x600
[ 29.687926] ? process_one_work+0x1e0/0x760
[ 29.687929] pm_runtime_work+0xc6/0xe0
[ 29.687932] process_one_work+0x222/0x760
[ 29.687936] worker_thread+0x193/0x370
[ 29.687938] ? apply_wqattrs_cleanup.part.0+0xc0/0xc0
[ 29.687941] kthread+0xf3/0x120
[ 29.687943] ? kthread_complete_and_exit+0x20/0x20
[ 29.687946] ret_from_fork+0x40/0x70
[ 29.687948] ? kthread_complete_and_exit+0x20/0x20
[ 29.687951] ret_from_fork_asm+0x11/0x20
[ 29.687955] </TASK>
Bisected to:
commit 551620f2a3816397266dfd812cd8b3be89f14be4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Nov 27 17:35:28 2020 +0100
drm/nouveau: Drop mutex_lock_nested for atomic
Purely conjecture, but I think the original locking inversion with the
legacy page flip code between flipping and ttm's bo move function
shoudn't exist anymore with atomic: With atomic the bo pinning and
actual modeset commit is completely separated in the code patsh.
This annotation was originally added in
commit 060810d7abaabcab282e062c595871d661561400
Author: Ben Skeggs <bskeggs@redhat.com>
Date: Mon Jul 8 14:15:51 2013 +1000
drm/nouveau: fix locking issues in page flipping paths
due to
commit b580c9e2b7ba5030a795aa2fb73b796523d65a78
Author: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Date: Thu Jun 27 13:48:18 2013 +0200
drm/nouveau: make flipping lockdep safe
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: nouveau@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20201127163528.2221671-1-daniel.vetter@ffwll.ch
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 645e7091dffc..bc542ac4c4b6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -774,7 +774,10 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
return ret;
}
- mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
+ if (drm_drv_uses_atomic_modeset(drm->dev))
+ mutex_lock(&cli->mutex);
+ else
+ mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, ctx->interruptible);
if (ret == 0) {
ret = drm->ttm.move(chan, bo, &bo->mem, new_reg);
Reverting this commit fixes the error.