Yea another lockdep blow up
3594 [ 142.974109] =====================================================
3595 [ 142.980229] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
3596 [ 142.986870] 5.19.0-xe+ #3276 Tainted: G W
3597 [ 142.992208] -----------------------------------------------------
3598 [ 142.998327] cat/1693 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
3599 [ 143.004533] ffff88810c7c73e0 (&sa_manager->lock){+.+.}-{2:2}, at: __drm_suballoc_free+0xb2/0x150
3600 [ 143.013370]
3601 and this task is already holding:
3602 [ 143.019229] ffff8881356abbf8 (&fence->lock){-.-.}-{2:2}, at: dma_fence_signal+0x1e/0x70
3603 [ 143.027269] which would create a new lock dependency:
3604 [ 143.032343] (&fence->lock){-.-.}-{2:2} -> (&sa_manager->lock){+.+.}-{2:2}
3605 [ 143.039248]
3606 but this new dependency connects a HARDIRQ-irq-safe lock:
3607 [ 143.047196] (&fence->lock){-.-.}-{2:2}
3608 [ 143.047197]
3609 ... which became HARDIRQ-irq-safe at:
3610 [ 143.057260] lock_acquire+0xd3/0x310
3611 [ 143.060946] _raw_spin_lock_irqsave+0x33/0x50
3612 [ 143.065411] dma_fence_signal+0x1e/0x70
3613 [ 143.069352] drm_sched_job_done.isra.14+0x78/0x1c0
3614 [ 143.074256] dma_fence_signal_timestamp_locked+0x96/0x1a0
3615 [ 143.079767] hw_fence_irq_run_cb+0x1ef/0x2a0 [xe]
3616 [ 143.084599] irq_work_single+0x3c/0x90
3617 [ 143.088454] irq_work_run_list+0x28/0x40
3618 [ 143.092485] irq_work_run+0x26/0x40
3619 [ 143.096082] __sysvec_irq_work+0x3e/0x1d0
3620 [ 143.100199] sysvec_irq_work+0x85/0xb0
3621 [ 143.104052] asm_sysvec_irq_work+0x16/0x20
3622 [ 143.108258] cpuidle_enter_state+0x104/0x4a0
3623 [ 143.112638] cpuidle_enter+0x24/0x40
3624 [ 143.116321] do_idle+0x22a/0x250
3625 [ 143.119658] cpu_startup_entry+0x14/0x20
3626 [ 143.123686] start_secondary+0x10f/0x130
3627 [ 143.127717] secondary_startup_64_no_verify+0xce/0xdb
3628 [ 143.132877]
3629 to a HARDIRQ-irq-unsafe lock:
3629 to a HARDIRQ-irq-unsafe lock:
3630 [ 143.138385] (&sa_manager->lock){+.+.}-{2:2}
3631 [ 143.138387]
3632 ... which became HARDIRQ-irq-unsafe at:
3633 [ 143.149056] ...
3634 [ 143.149057] lock_acquire+0xd3/0x310
3635 [ 143.154506] _raw_spin_lock+0x2a/0x40
3636 [ 143.158275] drm_suballoc_tryalloc+0x21/0x60
3637 [ 143.162654] drm_suballoc_new+0x102/0x230
3638 [ 143.166775] xe_bb_new+0x5a/0xc0 [xe]
3639 [ 143.170551] xe_migrate_clear+0x1f3/0x700 [xe]
3640 [ 143.175120] xe_bo_move+0x2bb/0x820 [xe]
3641 [ 143.179155] ttm_bo_handle_move_mem+0xb1/0x140
3642 [ 143.183706] ttm_bo_validate+0xeb/0x180
3643 [ 143.187650] ttm_bo_init_reserved+0xec/0x1d0
3644 [ 143.192031] __xe_bo_create_locked+0x168/0x270 [xe]
3645 [ 143.197022] xe_bo_create_locked+0x47/0x1e0 [xe]
3646 [ 143.201753] xe_gem_create_ioctl+0xbd/0x2e0 [xe]
3647 [ 143.206483] drm_ioctl_kernel+0xb0/0x140
3648 [ 143.210511] drm_ioctl+0x205/0x3d0
3649 [ 143.214018] __x64_sys_ioctl+0x6e/0xb0
3650 [ 143.217873] do_syscall_64+0x37/0x90
3651 [ 143.221555] entry_SYSCALL_64_after_hwframe+0x63/0xcd
3652 [ 143.226717]
3653 other info that might help us debug this:
3654
3655 [ 143.234752] Possible interrupt unsafe locking scenario:
3656
3657 [ 143.241568] CPU0 CPU1
3658 [ 143.246120] ---- ----
3659 [ 143.250672] lock(&sa_manager->lock);
3660 [ 143.254443] local_irq_disable();
3661 [ 143.260389] lock(&fence->lock);
3662 [ 143.266247] lock(&sa_manager->lock);
3663 [ 143.272539] <Interrupt>
3664 [ 143.275174] lock(&fence->lock);
3665 [ 143.278682]
3666 *** DEADLOCK ***