[bsw/bxt] igt@ - circular locking between mmap_sem <-> cpuhotplug via perf+fault
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_465/fi-apl-guc/igt@perf_pmu@busy-no-semaphores-rcs0.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_465/fi-apl-guc/igt@gem_exec_whisper@basic-fds-priority-all.html
<4> [91.489801] ======================================================
<4> [91.489806] WARNING: possible circular locking dependency detected
<4> [91.489813] 5.6.0-rc7-g68b152390f91-drmtip_465+ #1 Tainted: G U
<4> [91.489820] ------------------------------------------------------
<4> [91.489825] gem_exec_whispe/1073 is trying to acquire lock:
<4> [91.489831] ffffa14cef0b0158 (&mm->mmap_sem#2){++++}, at: __might_fault+0x39/0x90
<4> [91.489846]
but task is already holding lock:
<4> [91.489851] ffffcba77fc1a1e8 (&cpuctx_mutex){+.+.}, at: perf_event_ctx_lock_nested+0xc3/0x1e0
<4> [91.489862]
which lock already depends on the new lock.
<4> [91.489868]
the existing dependency chain (in reverse order) is:
<4> [91.489875]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [91.489884] __mutex_lock+0x9a/0x9c0
<4> [91.489890] perf_event_init_cpu+0xa4/0x140
<4> [91.489898] perf_event_init+0x1c9/0x1f9
<4> [91.489904] start_kernel+0x362/0x4eb
<4> [91.489910] secondary_startup_64+0xb6/0xc0
<4> [91.489915]
-> #2 (pmus_lock){+.+.}:
<4> [91.489922] __mutex_lock+0x9a/0x9c0
<4> [91.489928] perf_event_init_cpu+0x6b/0x140
<4> [91.489934] cpuhp_invoke_callback+0x9b/0x9d0
<4> [91.489940] _cpu_up+0xa2/0x140
<4> [91.489945] do_cpu_up+0x61/0xa0
<4> [91.489950] smp_init+0x57/0x96
<4> [91.489955] kernel_init_freeable+0x87/0x1dc
<4> [91.489961] kernel_init+0x5/0x100
<4> [91.489966] ret_from_fork+0x3a/0x50
<4> [91.489971]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [91.489979] cpus_read_lock+0x34/0xd0
<4> [91.489985] stop_machine+0x12/0x30
<4> [91.490092] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
<4> [91.490176] ggtt_bind_vma+0x31/0x50 [i915]
<4> [91.490270] __vma_bind+0x26/0x40 [i915]
<4> [91.490348] fence_work+0x1c/0x88 [i915]
<4> [91.490426] fence_notify+0x78/0xf8 [i915]
<4> [91.490504] __i915_sw_fence_complete+0x81/0x250 [i915]
<4> [91.490599] i915_vma_pin+0x21e/0x1190 [i915]
<4> [91.490693] i915_gem_object_ggtt_pin+0xb0/0x270 [i915]
<4> [91.490784] vm_fault_gtt+0x12f/0x990 [i915]
<4> [91.490797] __do_fault+0x45/0xf8
<4> [91.490802] __handle_mm_fault+0xaf2/0x10c0
<4> [91.490808] handle_mm_fault+0x154/0x340
<4> [91.490814] do_page_fault+0x3ca/0x6f0
<4> [91.490820] page_fault+0x34/0x40
<4> [91.490824]
-> #0 (&mm->mmap_sem#2){++++}:
<4> [91.490833] __lock_acquire+0x1328/0x15d0
<4> [91.490839] lock_acquire+0xa7/0x1c0
<4> [91.490844] __might_fault+0x63/0x90
<4> [91.490851] _copy_to_user+0x1e/0x80
<4> [91.490856] perf_read+0x200/0x2b0
<4> [91.490861] vfs_read+0x96/0x160
<4> [91.490866] ksys_read+0x9f/0xe0
<4> [91.490871] do_syscall_64+0x4f/0x240
<4> [91.490877] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [91.490883]
other info that might help us debug this:
<4> [91.490890] Chain exists of:
&mm->mmap_sem#2 --> pmus_lock --> &cpuctx_mutex
<4> [91.490901] Possible unsafe locking scenario:
<4> [91.490906] CPU0 CPU1
<4> [91.490911] ---- ----
<4> [91.490915] lock(&cpuctx_mutex);
<4> [91.490919] lock(pmus_lock);
<4> [91.490925] lock(&cpuctx_mutex);
<4> [91.490931] lock(&mm->mmap_sem#2);
<4> [91.490935]
*** DEADLOCK ***
Edited by Chris Wilson