igt@i915_suspend@sysfs-reader - incomplete - is trying to acquire lock at: nvme_dev_disable, but task is already holding lock at: process_one_work

https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8105/shard-tglb7/igt@i915_suspend@sysfs-reader.html https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8105/shard-tglb7/pstore8-1668532430_Panic_1.txt

4>[  374.803813] nvme nvme0: I/O 834 (I/O Cmd) QID 8 timeout, aborting
<4>[  401.107815] nvme nvme0: I/O 655 QID 6 timeout, reset controller
<4>[  401.107953] 
<4>[  401.107973] ======================================================
<4>[  401.108019] WARNING: possible circular locking dependency detected
<4>[  401.108037] 6.1.0-rc5-CI_DRM_12382-gcb7486469341+ #1 Not tainted
<4>[  401.108056] ------------------------------------------------------
<4>[  401.108074] kworker/5:1H/191 is trying to acquire lock:
<4>[  401.108090] ffff888104312340 (&dev->shutdown_lock){+.+.}-{3:3}, at: nvme_dev_disable+0x32/0x570
<4>[  401.108134] 
<4>[  401.108134] but task is already holding lock:
<4>[  401.108152] ffffc900003c7e78 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}, at: process_one_work+0x1eb/0x5b0
<4>[  401.108189] 
<4>[  401.108189] which lock already depends on the new lock.
<4>[  401.108189] 
<4>[  401.108214] 
<4>[  401.108214] the existing dependency chain (in reverse order) is:
<4>[  401.108235] 
<4>[  401.108235] -> #2 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}:
<4>[  401.108261]        lock_acquire+0xd3/0x310
<4>[  401.108278]        __flush_work+0x77/0x4e0
<4>[  401.108294]        __cancel_work_timer+0x14e/0x1f0
<4>[  401.108311]        nvme_sync_io_queues+0x2f/0x50
<4>[  401.108332]        nvme_sync_queues+0x9/0x30
<4>[  401.108349]        nvme_reset_work+0x63/0x10f0
<4>[  401.108366]        process_one_work+0x272/0x5b0
<4>[  401.108387]        worker_thread+0x37/0x370
<4>[  401.108403]        kthread+0xed/0x120
<4>[  401.108418]        ret_from_fork+0x1f/0x30
<4>[  401.108438] 
<4>[  401.108438] -> #1 (&ctrl->namespaces_rwsem){++++}-{3:3}:
<4>[  401.108465]        lock_acquire+0xd3/0x310
<4>[  401.108483]        down_read+0x39/0x140
<4>[  401.108503]        nvme_start_freeze+0x1d/0x50
<4>[  401.108523]        nvme_dev_disable+0x451/0x570
<4>[  401.108543]        nvme_suspend+0x4c/0x160
<4>[  401.108561]        pci_pm_suspend+0x6b/0x150
<4>[  401.108582]        dpm_run_callback+0x5d/0x250
<4>[  401.108605]        __device_suspend+0x143/0x590
<4>[  401.108622]        async_suspend+0x15/0x90
<4>[  401.108646]        async_run_entry_fn+0x28/0x130
<4>[  401.108666]        process_one_work+0x272/0x5b0
<4>[  401.108685]        worker_thread+0x37/0x370
<4>[  401.108733]        kthread+0xed/0x120
<4>[  401.108748]        ret_from_fork+0x1f/0x30
<4>[  401.108766] 
<4>[  401.108766] -> #0 (&dev->shutdown_lock){+.+.}-{3:3}:
<4>[  401.108792]        validate_chain+0xb3d/0x2000
<4>[  401.108811]        __lock_acquire+0x5a4/0xb70
<4>[  401.108829]        lock_acquire+0xd3/0x310
<4>[  401.108845]        __mutex_lock+0x97/0xf10
<4>[  401.108864]        nvme_dev_disable+0x32/0x570
<4>[  401.108884]        nvme_timeout.cold.78+0xe8/0x1d5
<4>[  401.108909]        blk_mq_check_expired+0x5a/0x90
<4>[  401.108931]        bt_iter+0x7e/0x90
<4>[  401.108951]        blk_mq_queue_tag_busy_iter+0x3d6/0x650
<4>[  401.108973]        blk_mq_timeout_work+0xd5/0x250
<4>[  401.108992]        process_one_work+0x272/0x5b0
<4>[  401.109012]        worker_thread+0x37/0x370
<4>[  401.109030]        kthread+0xed/0x120
<4>[  401.109046]        ret_from_fork+0x1f/0x30
<4>[  401.109064] 
<4>[  401.109064] other info that might help us debug this:
<4>[  401.109064] 
<4>[  401.110930] Chain exists of:
<4>[  401.110930]   &dev->shutdown_lock --> &ctrl->namespaces_rwsem --> (work_completion)(&q->timeout_work)
<4>[  401.110930] 
<4>[  401.113420]  Possible unsafe locking scenario:
<4>[  401.113420] 
<4>[  401.114578]        CPU0                    CPU1
<4>[  401.115153]        ----                    ----
<4>[  401.115717]   lock((work_completion)(&q->timeout_work));
<4>[  401.116278]                                lock(&ctrl->namespaces_rwsem);
<4>[  401.116844]                                lock((work_completion)(&q->timeout_work));
<4>[  401.117424]   lock(&dev->shutdown_lock);
<4>[  401.117992] 
<4>[  401.117992]  *** DEADLOCK ***

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

igt@i915_suspend@sysfs-reader - incomplete - is trying to acquire lock at: nvme_dev_disable, but task is already holding lock at: process_one_work