igt@xe_exec_fault_mode@many* - abort - WARNING: possible circular locking dependency detected
Stdout
Using IGT_SRANDOM=1718168397 for randomisation
Opened device: /dev/dri/card0
Starting subtest: many-execqueues-bindexecqueue-userptr-invalidate-race
Subtest many-execqueues-bindexecqueue-userptr-invalidate-race: SUCCESS (0.102s)
This test caused an abort condition: Lockdep not active
/proc/lockdep_stats contents:
lock-classes: 2385 [max: 8192]
direct dependencies: 22754 [max: 524288]
indirect dependencies: 133143
all direct dependencies: 509474
dependency chains: 42524 [max: 524288]
dependency chain hlocks used: 165773 [max: 2621440]
dependency chain hlocks lost: 0
in-hardirq chains: 217
in-softirq chains: 524
in-process chains: 41783
stack-trace entries: 220538 [max: 524288]
number of stack traces: 10036
number of stack hash chains: 7513
combined max dependencies: 487211504
hardirq-safe locks: 83
hardirq-unsafe locks: 1690
softirq-safe locks: 192
softirq-unsafe locks: 1600
irq-safe locks: 205
irq-unsafe locks: 1690
hardirq-read-safe locks: 4
hardirq-read-unsafe locks: 292
softirq-read-safe locks: 8
softirq-read-unsafe locks: 287
irq-read-safe locks: 8
irq-read-unsafe locks: 292
uncategorized locks: 312
unused locks: 1
max locking depth: 16
max bfs queue depth: 310
max lock class index: 2384
chain lookup misses: 42838
chain lookup hits: 183856121
cyclic checks: 42163
redundant checks: 0
redundant links: 0
find-mask forwards checks: 7302
find-mask backwards checks: 3874
hardirq on events: 105809009
hardirq off events: 105809002
redundant hardirq ons: 38
redundant hardirq offs: 6
softirq on events: 154693
softirq off events: 154693
redundant softirq ons: 0
redundant softirq offs: 0
debug_locks: 0
zapped classes: 3
zapped lock chains: 139
large chain blocks: 1
Stderr
Starting subtest: many-execqueues-bindexecqueue-userptr-invalidate-race
Subtest many-execqueues-bindexecqueue-userptr-invalidate-race: SUCCESS (0.102s)
Dmesg
<6> [93.931902] Console: switching to colour dummy device 80x25
<6> [93.931968] [IGT] xe_exec_fault_mode: executing
<6> [93.934106] [IGT] xe_exec_fault_mode: starting subtest many-execqueues-bindexecqueue-userptr-invalidate-race
<7> [93.955759] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]]
ASID: 121
VFID: 0
PDATA: 0x0490
Faulted Address: 0x00000000001a0000
FaultType: 0
AccessType: 0
FaultLevel: 4
EngineClass: 0
EngineInstance: 0
<7> [93.955856] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22
<7> [93.955921] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]]
ASID: 121
VFID: 0
PDATA: 0x0c90
Faulted Address: 0x00000000001a1000
FaultType: 0
AccessType: 0
FaultLevel: 4
EngineClass: 0
EngineInstance: 0
<7> [93.955982] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22
<6> [93.956128] xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=rcs, logical_mask: 0x1, guc_id=2
<4> [93.970246]
<4> [93.970263] ======================================================
<4> [93.970268] WARNING: possible circular locking dependency detected
<4> [93.970273] 6.10.0-rc3-xe #1 Tainted: G U
<4> [93.970278] ------------------------------------------------------
<4> [93.970281] kworker/u37:4/999 is trying to acquire lock:
<4> [93.970286] ffff8881044960d8 (&mm->mmap_lock){++++}-{3:3}, at: xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.970441]
but task is already holding lock:
<4> [93.970445] ffff8881252965a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x4b/0x2e0 [drm_exec]
<4> [93.970457]
which lock already depends on the new lock.
<4> [93.970462]
the existing dependency chain (in reverse order) is:
<4> [93.970469]
-> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
<4> [93.970475] __ww_mutex_lock.constprop.0+0xc1/0x14e0
<4> [93.970488] ww_mutex_lock+0x3c/0x110
<4> [93.970492] dma_resv_lockdep+0x180/0x330
<4> [93.970507] do_one_initcall+0x61/0x3d0
<4> [93.970518] kernel_init_freeable+0x3f5/0x6b0
<4> [93.970530] kernel_init+0x1b/0x170
<4> [93.970539] ret_from_fork+0x39/0x60
<4> [93.970547] ret_from_fork_asm+0x1a/0x30
<4> [93.970551]
-> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
<4> [93.970557] dma_resv_lockdep+0x154/0x330
<4> [93.970563] do_one_initcall+0x61/0x3d0
<4> [93.970567] kernel_init_freeable+0x3f5/0x6b0
<4> [93.970572] kernel_init+0x1b/0x170
<4> [93.970576] ret_from_fork+0x39/0x60
<4> [93.970579] ret_from_fork_asm+0x1a/0x30
<4> [93.970583]
-> #0 (&mm->mmap_lock){++++}-{3:3}:
<4> [93.970588] __lock_acquire+0x1690/0x3070
<4> [93.970601] lock_acquire+0xd9/0x300
<4> [93.970605] down_read+0x43/0x1b0
<4> [93.970609] xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.970705] xe_vma_userptr_pin_pages+0x13d/0x190 [xe]
<4> [93.970784] __xe_vma_op_execute+0x4b4/0x6c0 [xe]
<4> [93.970862] ops_execute+0xdd/0x1b0 [xe]
<4> [93.970938] xe_vma_rebind+0xd8/0x2e0 [xe]
<4> [93.971017] handle_vma_pagefault.isra.0+0x160/0x550 [xe]
<4> [93.971078] pf_queue_work_func+0x37a/0x450 [xe]
<4> [93.971137] process_scheduled_works+0x3aa/0x750
<4> [93.971146] worker_thread+0x14f/0x2f0
<4> [93.971150] kthread+0xf5/0x130
<4> [93.971155] ret_from_fork+0x39/0x60
<4> [93.971158] ret_from_fork_asm+0x1a/0x30
<4> [93.971162]
other info that might help us debug this:
<4> [93.971166] Chain exists of:
&mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
<4> [93.971176] Possible unsafe locking scenario:
<4> [93.971180] CPU0 CPU1
<4> [93.971184] ---- ----
<4> [93.971187] lock(reservation_ww_class_mutex);
<4> [93.971190] lock(reservation_ww_class_acquire);
<4> [93.971195] lock(reservation_ww_class_mutex);
<4> [93.971200] rlock(&mm->mmap_lock);
<4> [93.971203]
*** DEADLOCK ***
<4> [93.971207] 5 locks held by kworker/u37:4/999:
<4> [93.971211] #0: ffff8881257b2948 ((wq_completion)xe_gt_page_fault_work_queue){+.+.}-{0:0}, at: process_scheduled_works+0x5c6/0x750
<4> [93.971222] #1: ffffc90001b2fe48 ((work_completion)(>->usm.pf_queue[i].worker)){+.+.}-{0:0}, at: process_scheduled_works+0x366/0x750
<4> [93.971233] #2: ffff888131d8ec80 (&vm->lock){++++}-{3:3}, at: pf_queue_work_func+0x33a/0x450 [xe]
<4> [93.971299] #3: ffffc90001b2fce8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: handle_vma_pagefault.isra.0+0xd9/0x550 [xe]
<4> [93.971362] #4: ffff8881252965a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x4b/0x2e0 [drm_exec]
<4> [93.971370]
stack backtrace:
<4> [93.971376] CPU: 6 PID: 999 Comm: kworker/u37:4 Tainted: G U 6.10.0-rc3-xe #1
<4> [93.971384] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3130.D83.2404031315 04/03/2024
<4> [93.971392] Workqueue: xe_gt_page_fault_work_queue pf_queue_work_func [xe] (xe_gt_page_fault_work_q)
<4> [93.971457] Call Trace:
<4> [93.971462] <TASK>
<4> [93.971468] dump_stack_lvl+0x9b/0xf0
<4> [93.971477] dump_stack+0x10/0x20
<4> [93.971482] print_circular_bug.isra.0+0x2d2/0x410
<4> [93.971487] check_noncircular+0x155/0x170
<4> [93.971493] __lock_acquire+0x1690/0x3070
<4> [93.971500] lock_acquire+0xd9/0x300
<4> [93.971504] ? xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.971588] down_read+0x43/0x1b0
<4> [93.971592] ? xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.971672] xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.971750] ? lock_acquire+0xd9/0x300
<4> [93.971758] xe_vma_userptr_pin_pages+0x13d/0x190 [xe]
<4> [93.971836] __xe_vma_op_execute+0x4b4/0x6c0 [xe]
<4> [93.971914] ops_execute+0xdd/0x1b0 [xe]
<4> [93.971990] ? xe_vm_ops_add_rebind+0xaa/0xe0 [xe]
<4> [93.972068] xe_vma_rebind+0xd8/0x2e0 [xe]
<4> [93.972144] ? ww_mutex_lock+0x3c/0x110
<4> [93.972148] ? ww_mutex_lock+0x3c/0x110
<4> [93.972152] ? drm_exec_lock_obj+0x6c/0x2e0 [drm_exec]
<4> [93.972157] ? xe_vm_lock_vma+0x2b/0x60 [xe]