[CI][DRMTIP] All tests - dmesg-warn / incomplete - WARNING: possible circular locking dependency detected
@mupuf
Submitted by Martin Peres Assigned to Tvrtko Ursulin @tursulin
Link to original bug (#109385)
Description
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_196/fi-bsw-n3050/igt@perf_pmu@busy-double-start-vecs0.html
<6>
[227.644735] [IGT] perf_pmu: executing
<6>
[227.657613] [IGT] perf_pmu: starting subtest busy-double-start-vecs0
<4>
[230.164203]
<4>
[230.164216] ======================================================
<4>
[230.164224] WARNING: possible circular locking dependency detected
<4>
[230.164233] 5.0.0-rc2-g19fd97ec0d64-drmtip_196+ #1 (moved) Tainted: G U
<4>
[230.164241] ------------------------------------------------------
<4>
[230.164249] perf_pmu/2178 is trying to acquire lock:
<4>
[230.164257] 000000003ecd0ec4 (&mm->mmap_sem)++, at: __might_fault+0x39/0x90
<4>
[230.164275]
but task is already holding lock:
<4>
[230.164282] 0000000035968c21 (&cpuctx_mutex){+.+.}, at: perf_event_ctx_lock_nested+0xbc/0x1d0
<4>
[230.164297]
which lock already depends on the new lock.
<4>
[230.164306]
the existing dependency chain (in reverse order) is:
<4>
[230.164314]
-> #4 (moved) (&cpuctx_mutex){+.+.}:
<4>
[230.164326] perf_event_init_cpu+0x5a/0x90
<4>
[230.164335] perf_event_init+0x1ab/0x1db
<4>
[230.164346] start_kernel+0x339/0x4c0
<4>
[230.164355] secondary_startup_64+0xa4/0xb0
<4>
[230.164362]
-> #3 (moved) (pmus_lock){+.+.}:
<4>
[230.164373] perf_event_init_cpu+0x21/0x90
<4>
[230.164382] cpuhp_invoke_callback+0x9b/0xa10
<4>
[230.164390] _cpu_up+0xa2/0x140
<4>
[230.164398] do_cpu_up+0x91/0xc0
<4>
[230.164406] smp_init+0x5d/0xa9
<4>
[230.164413] kernel_init_freeable+0x16f/0x359
<4>
[230.164423] kernel_init+0x5/0x100
<4>
[230.164431] ret_from_fork+0x3a/0x50
<4>
[230.164437]
-> #2 (cpu_hotplug_lock.rw_sem)++:
<4>
[230.164450] stop_machine+0x12/0x30
<4>
[230.164568] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
<4>
[230.164655] ggtt_bind_vma+0x59/0x90 [i915]
<4>
[230.164744] i915_vma_bind+0xe7/0x2c0 [i915]
<4>
[230.164833] __i915_vma_do_pin+0x93/0xd90 [i915]
<4>
[230.164921] i915_gem_init+0x31b/0xa20 [i915]
<4>
[230.165002] i915_driver_load+0xd54/0x1590 [i915]
<4>
[230.165083] i915_pci_probe+0x29/0xa0 [i915]
<4>
[230.165093] pci_device_probe+0xa1/0x130
<4>
[230.165102] really_probe+0xf3/0x3e0
<4>
[230.165110] driver_probe_device+0x10a/0x120
<4>
[230.165118] __driver_attach+0xdb/0x100
<4>
[230.165125] bus_for_each_dev+0x74/0xc0
<4>
[230.165133] bus_add_driver+0x15f/0x250
<4>
[230.165141] driver_register+0x56/0xe0
<4>
[230.165148] do_one_initcall+0x58/0x2e0
<4>
[230.165157] do_init_module+0x56/0x1ea
<4>
[230.165164] load_module+0x2718/0x29f0
<4>
[230.165171] __se_sys_finit_module+0xd3/0xf0
<4>
[230.165179] do_syscall_64+0x55/0x190
<4>
[230.165187] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>
[230.165194]
-> #1 (moved) (&dev->struct_mutex){+.+.}:
<4>
[230.165284] i915_mutex_lock_interruptible+0x57/0x120 [i915]
<4>
[230.165372] i915_gem_fault+0x1ec/0x810 [i915]
<4>
[230.165380] __do_fault+0x2c/0xb0
<4>
[230.165387] __handle_mm_fault+0x98c/0xfa0
<4>
[230.165395] handle_mm_fault+0x196/0x3a0
<4>
[230.165404] __do_page_fault+0x246/0x500
<4>
[230.165412] page_fault+0x1e/0x30
<4>
[230.165417]
-> #0 (&mm->mmap_sem)++:
<4>
[230.165428] __might_fault+0x63/0x90
<4>
[230.165437] _copy_to_user+0x1e/0x70
<4>
[230.165445] perf_read+0x200/0x2a0
<4>
[230.165453] __vfs_read+0x31/0x190
<4>
[230.165461] vfs_read+0x9e/0x150
<4>
[230.165468] ksys_read+0x50/0xc0
<4>
[230.165475] do_syscall_64+0x55/0x190
<4>
[230.165483] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>
[230.165490]
other info that might help us debug this:
<4>
[230.165500] Chain exists of:
&mm->mmap_sem --> pmus_lock --> &cpuctx_mutex
<4>
[230.165514] Possible unsafe locking scenario:
<4>
[230.165521] CPU0 CPU1
<4>
[230.165528] ---- ----
<4>
[230.165534] lock(&cpuctx_mutex);
<4>
[230.165540] lock(pmus_lock);
<4>
[230.165548] lock(&cpuctx_mutex);
<4>
[230.165556] lock(&mm->mmap_sem);
<4>
[230.165562]
*** DEADLOCK ***
<4>
[230.165571] 1 lock held by perf_pmu/2178:
<4>
[230.165577] #0: 0000000035968c21 (&cpuctx_mutex){+.+.}, at: perf_event_ctx_lock_nested+0xbc/0x1d0
<4>
[230.165592]
stack backtrace:
<4>
[230.165601] CPU: 0 PID: 2178 Comm: perf_pmu Tainted: G U 5.0.0-rc2-g19fd97ec0d64-drmtip_196+ #1 (moved)
<4>
[230.165613] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4>
[230.165623] Call Trace:
<4>
[230.165632] dump_stack+0x67/0x9b
<4>
[230.165643] print_circular_bug.isra.16+0x1c8/0x2b0
<4>
[230.165652] __lock_acquire+0x183a/0x1b00
<4>
[230.165662] ? perf_event_update_sibling_time.part.2+0x30/0x30
<4>
[230.165672] ? smp_call_function_single+0xb4/0x150
<4>
[230.165681] ? __mutex_lock+0x8c/0x960
<4>
[230.165690] ? lock_acquire+0xa6/0x1c0
<4>
[230.165697] lock_acquire+0xa6/0x1c0
<4>
[230.165705] ? __might_fault+0x39/0x90
<4>
[230.165713] __might_fault+0x63/0x90
<4>
[230.165720] ? __might_fault+0x39/0x90
<4>
[230.165728] _copy_to_user+0x1e/0x70
<4>
[230.165736] perf_read+0x200/0x2a0
<4>
[230.165745] ? __fd_install+0xbc/0x2c0
<4>
[230.165754] __vfs_read+0x31/0x190
<4>
[230.165762] ? lock_acquire+0xa6/0x1c0
<4>
[230.165771] ? __task_pid_nr_ns+0xb9/0x1f0
<4>
[230.165779] vfs_read+0x9e/0x150
<4>
[230.165787] ksys_read+0x50/0xc0
<4>
[230.165795] do_syscall_64+0x55/0x190
<4>
[230.165803] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>
[230.165812] RIP: 0033:0x7ff01ebc534e
<4>
[230.165820] Code: 00 00 00 00 48 8b 15 71 8c 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 8b 05 ba d0 20 00 85 c0 75 16 31 c0 0f 05 <48>
3d 00 f0 ff ff 77 5a f3 c3 0f 1f 84 00 00 00 00 00 41 54 55 49
<4>
[230.165840] RSP: 002b:00007fffe9340778 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
<4>
[230.165851] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff01ebc534e
<4>
[230.165860] RDX: 0000000000000010 RSI: 00007fffe93407a0 RDI: 0000000000000007
<4>
[230.165868] RBP: 00007fffe93407c0 R08: 0000000000000000 R09: 0000000000000036
<4>
[230.165877] R10: 00000000ffffffca R11: 0000000000000246 R12: 000055b7a6f08480
<4>
[230.165886] R13: 00007fffe93423d0 R14: 0000000000000000 R15: 0000000000000000
<6>
[231.176301] [IGT] perf_pmu: exiting, ret=0