[GT216][Linux 5.16.4] Long stalls in dma_fence_default_wait
Games (for example Red Eclipse) start, but they soon stall in D state with a call stack like
[<0>] dma_fence_default_wait+0x1a4/0x240
[<0>] dma_fence_wait_timeout+0xdd/0x100
[<0>] dma_resv_wait_timeout+0x91/0xf0
[<0>] ttm_bo_wait+0x39/0x60 [ttm]
[<0>] ttm_bo_move_accel_cleanup+0x8b/0x390 [ttm]
[<0>] nouveau_bo_move+0x427/0x820 [nouveau]
[<0>] ttm_bo_handle_move_mem+0x8d/0x190 [ttm]
[<0>] ttm_mem_evict_first+0x25f/0x470 [ttm]
[<0>] ttm_bo_mem_space+0x248/0x2a0 [ttm]
[<0>] ttm_bo_validate+0x90/0x130 [ttm]
[<0>] ttm_bo_init_reserved+0x1d1/0x260 [ttm]
[<0>] ttm_bo_init+0x5a/0xd0 [ttm]
[<0>] nouveau_bo_init+0x6e/0x80 [nouveau]
[<0>] nouveau_gem_new+0x80/0xe0 [nouveau]
[<0>] nouveau_gem_ioctl_new+0x55/0x100 [nouveau]
[<0>] drm_ioctl_kernel+0xb0/0x140 [drm]
[<0>] drm_ioctl+0x220/0x3e0 [drm]
[<0>] nouveau_drm_ioctl+0x55/0xa0 [nouveau]
[<0>] __x64_sys_ioctl+0x82/0xb0
[<0>] do_syscall_64+0x3b/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
It's a Debian bullseye system; the stall was pretty much immediate with the stock 5.10 kernel, but still happens after a couple of minutes under 5.15 and 5.16. Otherwise the machine stays responsible, I can SSH in and kill the stuck processes, which die eventually (minutes later).
Edited by Ferenc Wágner