igt@xe_exec_fault_mode@many* - abort - WARNING: possible circular locking dependency detected

https://gfx-ci.igk.intel.com/cibuglog-ng/testresult/1811208932?query_key=6812aebda62eccfbbd81fbfa27dc13c6d7d5c8f3
Stdout	
Using IGT_SRANDOM=1718168397 for randomisation
Opened device: /dev/dri/card0
Starting subtest: many-execqueues-bindexecqueue-userptr-invalidate-race
Subtest many-execqueues-bindexecqueue-userptr-invalidate-race: SUCCESS (0.102s)



This test caused an abort condition: Lockdep not active



/proc/lockdep_stats contents:
 lock-classes:                         2385 [max: 8192]
 direct dependencies:                 22754 [max: 524288]
 indirect dependencies:              133143
 all direct dependencies:            509474
 dependency chains:                   42524 [max: 524288]
 dependency chain hlocks used:       165773 [max: 2621440]
 dependency chain hlocks lost:            0
 in-hardirq chains:                     217
 in-softirq chains:                     524
 in-process chains:                   41783
 stack-trace entries:                220538 [max: 524288]
 number of stack traces:              10036
 number of stack hash chains:          7513
 combined max dependencies:       487211504
 hardirq-safe locks:                     83
 hardirq-unsafe locks:                 1690
 softirq-safe locks:                    192
 softirq-unsafe locks:                 1600
 irq-safe locks:                        205
 irq-unsafe locks:                     1690
 hardirq-read-safe locks:                 4
 hardirq-read-unsafe locks:             292
 softirq-read-safe locks:                 8
 softirq-read-unsafe locks:             287
 irq-read-safe locks:                     8
 irq-read-unsafe locks:                 292
 uncategorized locks:                   312
 unused locks:                            1
 max locking depth:                      16
 max bfs queue depth:                   310
 max lock class index:                 2384
 chain lookup misses:                 42838
 chain lookup hits:               183856121
 cyclic checks:                       42163
 redundant checks:                        0
 redundant links:                         0
 find-mask forwards checks:            7302
 find-mask backwards checks:           3874
 hardirq on events:               105809009
 hardirq off events:              105809002
 redundant hardirq ons:                  38
 redundant hardirq offs:                  6
 softirq on events:                  154693
 softirq off events:                 154693
 redundant softirq ons:                   0
 redundant softirq offs:                  0
 debug_locks:                             0



 zapped classes:                          3
 zapped lock chains:                    139
 large chain blocks:                      1

Stderr	
Starting subtest: many-execqueues-bindexecqueue-userptr-invalidate-race
Subtest many-execqueues-bindexecqueue-userptr-invalidate-race: SUCCESS (0.102s)

Dmesg	
<6> [93.931902] Console: switching to colour dummy device 80x25
<6> [93.931968] [IGT] xe_exec_fault_mode: executing
<6> [93.934106] [IGT] xe_exec_fault_mode: starting subtest many-execqueues-bindexecqueue-userptr-invalidate-race
<7> [93.955759] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]] 
	ASID: 121
	VFID: 0
	PDATA: 0x0490
	Faulted Address: 0x00000000001a0000
	FaultType: 0
	AccessType: 0
	FaultLevel: 4
	EngineClass: 0
	EngineInstance: 0
<7> [93.955856] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22
<7> [93.955921] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]] 
	ASID: 121
	VFID: 0
	PDATA: 0x0c90
	Faulted Address: 0x00000000001a1000
	FaultType: 0
	AccessType: 0
	FaultLevel: 4
	EngineClass: 0
	EngineInstance: 0
<7> [93.955982] xe 0000:00:02.0: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22
<6> [93.956128] xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=rcs, logical_mask: 0x1, guc_id=2
<4> [93.970246] 
<4> [93.970263] ======================================================
<4> [93.970268] WARNING: possible circular locking dependency detected
<4> [93.970273] 6.10.0-rc3-xe #1 Tainted: G     U            
<4> [93.970278] ------------------------------------------------------
<4> [93.970281] kworker/u37:4/999 is trying to acquire lock:
<4> [93.970286] ffff8881044960d8 (&mm->mmap_lock){++++}-{3:3}, at: xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.970441] 
but task is already holding lock:
<4> [93.970445] ffff8881252965a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x4b/0x2e0 [drm_exec]
<4> [93.970457] 
which lock already depends on the new lock.



<4> [93.970462] 
the existing dependency chain (in reverse order) is:
<4> [93.970469] 
-> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
<4> [93.970475]        __ww_mutex_lock.constprop.0+0xc1/0x14e0
<4> [93.970488]        ww_mutex_lock+0x3c/0x110
<4> [93.970492]        dma_resv_lockdep+0x180/0x330
<4> [93.970507]        do_one_initcall+0x61/0x3d0
<4> [93.970518]        kernel_init_freeable+0x3f5/0x6b0
<4> [93.970530]        kernel_init+0x1b/0x170
<4> [93.970539]        ret_from_fork+0x39/0x60
<4> [93.970547]        ret_from_fork_asm+0x1a/0x30
<4> [93.970551] 
-> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
<4> [93.970557]        dma_resv_lockdep+0x154/0x330
<4> [93.970563]        do_one_initcall+0x61/0x3d0
<4> [93.970567]        kernel_init_freeable+0x3f5/0x6b0
<4> [93.970572]        kernel_init+0x1b/0x170
<4> [93.970576]        ret_from_fork+0x39/0x60
<4> [93.970579]        ret_from_fork_asm+0x1a/0x30
<4> [93.970583] 
-> #0 (&mm->mmap_lock){++++}-{3:3}:
<4> [93.970588]        __lock_acquire+0x1690/0x3070
<4> [93.970601]        lock_acquire+0xd9/0x300
<4> [93.970605]        down_read+0x43/0x1b0
<4> [93.970609]        xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.970705]        xe_vma_userptr_pin_pages+0x13d/0x190 [xe]
<4> [93.970784]        __xe_vma_op_execute+0x4b4/0x6c0 [xe]
<4> [93.970862]        ops_execute+0xdd/0x1b0 [xe]
<4> [93.970938]        xe_vma_rebind+0xd8/0x2e0 [xe]
<4> [93.971017]        handle_vma_pagefault.isra.0+0x160/0x550 [xe]
<4> [93.971078]        pf_queue_work_func+0x37a/0x450 [xe]
<4> [93.971137]        process_scheduled_works+0x3aa/0x750
<4> [93.971146]        worker_thread+0x14f/0x2f0
<4> [93.971150]        kthread+0xf5/0x130
<4> [93.971155]        ret_from_fork+0x39/0x60
<4> [93.971158]        ret_from_fork_asm+0x1a/0x30
<4> [93.971162] 
other info that might help us debug this:



<4> [93.971166] Chain exists of:
  &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex



<4> [93.971176]  Possible unsafe locking scenario:



<4> [93.971180]        CPU0                    CPU1
<4> [93.971184]        ----                    ----
<4> [93.971187]   lock(reservation_ww_class_mutex);
<4> [93.971190]                                lock(reservation_ww_class_acquire);
<4> [93.971195]                                lock(reservation_ww_class_mutex);
<4> [93.971200]   rlock(&mm->mmap_lock);
<4> [93.971203] 
 *** DEADLOCK ***



<4> [93.971207] 5 locks held by kworker/u37:4/999:
<4> [93.971211]  #0: ffff8881257b2948 ((wq_completion)xe_gt_page_fault_work_queue){+.+.}-{0:0}, at: process_scheduled_works+0x5c6/0x750
<4> [93.971222]  #1: ffffc90001b2fe48 ((work_completion)(&gt->usm.pf_queue[i].worker)){+.+.}-{0:0}, at: process_scheduled_works+0x366/0x750
<4> [93.971233]  #2: ffff888131d8ec80 (&vm->lock){++++}-{3:3}, at: pf_queue_work_func+0x33a/0x450 [xe]
<4> [93.971299]  #3: ffffc90001b2fce8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: handle_vma_pagefault.isra.0+0xd9/0x550 [xe]
<4> [93.971362]  #4: ffff8881252965a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x4b/0x2e0 [drm_exec]
<4> [93.971370] 
stack backtrace:
<4> [93.971376] CPU: 6 PID: 999 Comm: kworker/u37:4 Tainted: G     U             6.10.0-rc3-xe #1
<4> [93.971384] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3130.D83.2404031315 04/03/2024
<4> [93.971392] Workqueue: xe_gt_page_fault_work_queue pf_queue_work_func [xe] (xe_gt_page_fault_work_q)
<4> [93.971457] Call Trace:
<4> [93.971462]  <TASK>
<4> [93.971468]  dump_stack_lvl+0x9b/0xf0
<4> [93.971477]  dump_stack+0x10/0x20
<4> [93.971482]  print_circular_bug.isra.0+0x2d2/0x410
<4> [93.971487]  check_noncircular+0x155/0x170
<4> [93.971493]  __lock_acquire+0x1690/0x3070
<4> [93.971500]  lock_acquire+0xd9/0x300
<4> [93.971504]  ? xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.971588]  down_read+0x43/0x1b0
<4> [93.971592]  ? xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.971672]  xe_hmm_userptr_populate_range+0x20b/0x610 [xe]
<4> [93.971750]  ? lock_acquire+0xd9/0x300
<4> [93.971758]  xe_vma_userptr_pin_pages+0x13d/0x190 [xe]
<4> [93.971836]  __xe_vma_op_execute+0x4b4/0x6c0 [xe]
<4> [93.971914]  ops_execute+0xdd/0x1b0 [xe]
<4> [93.971990]  ? xe_vm_ops_add_rebind+0xaa/0xe0 [xe]
<4> [93.972068]  xe_vma_rebind+0xd8/0x2e0 [xe]
<4> [93.972144]  ? ww_mutex_lock+0x3c/0x110
<4> [93.972148]  ? ww_mutex_lock+0x3c/0x110
<4> [93.972152]  ? drm_exec_lock_obj+0x6c/0x2e0 [drm_exec]
<4> [93.972157]  ? xe_vm_lock_vma+0x2b/0x60 [xe]
Admin message

Admin message

igt@xe_exec_fault_mode@many* - abort - WARNING: possible circular locking dependency detected