[LNL] ttm_bo_delayed_delete timeout when running Vulkan CTS
On LNL, when I run the Vulkan CTS tests, after a while I may get these error messages:
[ 246.798612] INFO: task kworker/u36:0:135 blocked for more than 122 seconds.
[ 246.805829] Tainted: G W 6.11.0pz+ #98
[ 246.811668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.819699] task:kworker/u36:0 state:D stack:0 pid:135 tgid:135 ppid:2 flags:0x00004000
[ 246.819712] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.819736] Call Trace:
[ 246.819740] <TASK>
[ 246.819749] __schedule+0x3b4/0xac0
[ 246.819761] schedule+0x27/0xf0
[ 246.819764] schedule_timeout+0x12f/0x160
[ 246.819770] dma_fence_default_wait+0x1d8/0x250
[ 246.819779] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.819782] dma_fence_wait_timeout+0x108/0x140
[ 246.819786] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.819789] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.819796] process_one_work+0x17b/0x330
[ 246.819801] worker_thread+0x2ce/0x3f0
[ 246.819802] ? __pfx_worker_thread+0x10/0x10
[ 246.819804] kthread+0xcf/0x100
[ 246.819807] ? __pfx_kthread+0x10/0x10
[ 246.819809] ret_from_fork+0x31/0x50
[ 246.819814] ? __pfx_kthread+0x10/0x10
[ 246.819816] ret_from_fork_asm+0x1a/0x30
[ 246.819821] </TASK>
[ 246.819861] INFO: task kworker/u36:1:1157 blocked for more than 122 seconds.
[ 246.827127] Tainted: G W 6.11.0pz+ #98
[ 246.832969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.840999] task:kworker/u36:1 state:D stack:0 pid:1157 tgid:1157 ppid:2 flags:0x00004000
[ 246.841006] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.841014] Call Trace:
[ 246.841018] <TASK>
[ 246.841021] __schedule+0x3b4/0xac0
[ 246.841025] schedule+0x27/0xf0
[ 246.841027] schedule_timeout+0x12f/0x160
[ 246.841030] dma_fence_default_wait+0x1d8/0x250
[ 246.841032] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.841035] dma_fence_wait_timeout+0x108/0x140
[ 246.841037] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.841039] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.841045] process_one_work+0x17b/0x330
[ 246.841047] worker_thread+0x2ce/0x3f0
[ 246.841048] ? __pfx_worker_thread+0x10/0x10
[ 246.841050] kthread+0xcf/0x100
[ 246.841052] ? __pfx_kthread+0x10/0x10
[ 246.841054] ret_from_fork+0x31/0x50
[ 246.841055] ? __pfx_kthread+0x10/0x10
[ 246.841057] ret_from_fork_asm+0x1a/0x30
[ 246.841060] </TASK>
[ 246.841066] INFO: task kworker/u36:2:1226 blocked for more than 122 seconds.
[ 246.848306] Tainted: G W 6.11.0pz+ #98
[ 246.854153] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.862221] task:kworker/u36:2 state:D stack:0 pid:1226 tgid:1226 ppid:2 flags:0x00004000
[ 246.862225] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.862232] Call Trace:
[ 246.862234] <TASK>
[ 246.862236] __schedule+0x3b4/0xac0
[ 246.862239] schedule+0x27/0xf0
[ 246.862242] schedule_timeout+0x12f/0x160
[ 246.862245] dma_fence_default_wait+0x1d8/0x250
[ 246.862247] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.862250] dma_fence_wait_timeout+0x108/0x140
[ 246.862253] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.862255] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.862261] process_one_work+0x17b/0x330
[ 246.862263] worker_thread+0x2ce/0x3f0
[ 246.862264] ? __pfx_worker_thread+0x10/0x10
[ 246.862266] kthread+0xcf/0x100
[ 246.862268] ? __pfx_kthread+0x10/0x10
[ 246.862270] ret_from_fork+0x31/0x50
[ 246.862272] ? __pfx_kthread+0x10/0x10
[ 246.862274] ret_from_fork_asm+0x1a/0x30
[ 246.862276] </TASK>
[ 246.862279] INFO: task kworker/u36:4:1229 blocked for more than 122 seconds.
[ 246.869521] Tainted: G W 6.11.0pz+ #98
[ 246.875362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.883391] task:kworker/u36:4 state:D stack:0 pid:1229 tgid:1229 ppid:2 flags:0x00004000
[ 246.883396] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.883402] Call Trace:
[ 246.883405] <TASK>
[ 246.883408] __schedule+0x3b4/0xac0
[ 246.883411] schedule+0x27/0xf0
[ 246.883413] schedule_timeout+0x12f/0x160
[ 246.883416] dma_fence_default_wait+0x1d8/0x250
[ 246.883418] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.883421] dma_fence_wait_timeout+0x108/0x140
[ 246.883424] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.883426] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.883431] process_one_work+0x17b/0x330
[ 246.883433] worker_thread+0x2ce/0x3f0
[ 246.883434] ? __pfx_worker_thread+0x10/0x10
[ 246.883436] kthread+0xcf/0x100
[ 246.883437] ? __pfx_kthread+0x10/0x10
[ 246.883439] ret_from_fork+0x31/0x50
[ 246.883441] ? __pfx_kthread+0x10/0x10
[ 246.883443] ret_from_fork_asm+0x1a/0x30
[ 246.883445] </TASK>
[ 246.883447] INFO: task kworker/u36:5:1230 blocked for more than 122 seconds.
[ 246.890686] Tainted: G W 6.11.0pz+ #98
[ 246.896529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.904559] task:kworker/u36:5 state:D stack:0 pid:1230 tgid:1230 ppid:2 flags:0x00004000
[ 246.904563] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.904570] Call Trace:
[ 246.904572] <TASK>
[ 246.904574] __schedule+0x3b4/0xac0
[ 246.904577] schedule+0x27/0xf0
[ 246.904579] schedule_timeout+0x12f/0x160
[ 246.904581] dma_fence_default_wait+0x1d8/0x250
[ 246.904584] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.904586] dma_fence_wait_timeout+0x108/0x140
[ 246.904590] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.904592] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.904597] process_one_work+0x17b/0x330
[ 246.904598] worker_thread+0x2ce/0x3f0
[ 246.904600] ? __pfx_worker_thread+0x10/0x10
[ 246.904602] kthread+0xcf/0x100
[ 246.904603] ? __pfx_kthread+0x10/0x10
[ 246.904605] ret_from_fork+0x31/0x50
[ 246.904607] ? __pfx_kthread+0x10/0x10
[ 246.904609] ret_from_fork_asm+0x1a/0x30
[ 246.904612] </TASK>
[ 246.904701] INFO: task kworker/u36:6:1231 blocked for more than 122 seconds.
[ 246.911882] Tainted: G W 6.11.0pz+ #98
[ 246.917620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.925523] task:kworker/u36:6 state:D stack:0 pid:1231 tgid:1231 ppid:2 flags:0x00004000
[ 246.925532] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.925554] Call Trace:
[ 246.925557] <TASK>
[ 246.925563] __schedule+0x3b4/0xac0
[ 246.925572] schedule+0x27/0xf0
[ 246.925574] schedule_timeout+0x12f/0x160
[ 246.925577] dma_fence_default_wait+0x1d8/0x250
[ 246.925583] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.925586] dma_fence_wait_timeout+0x108/0x140
[ 246.925588] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.925590] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.925595] process_one_work+0x17b/0x330
[ 246.925598] worker_thread+0x2ce/0x3f0
[ 246.925600] ? __pfx_worker_thread+0x10/0x10
[ 246.925600] kthread+0xcf/0x100
[ 246.925602] ? __pfx_kthread+0x10/0x10
[ 246.925603] ret_from_fork+0x31/0x50
[ 246.925608] ? __pfx_kthread+0x10/0x10
[ 246.925609] ret_from_fork_asm+0x1a/0x30
[ 246.925613] </TASK>
[ 246.925618] INFO: task kworker/u36:8:1233 blocked for more than 123 seconds.
[ 246.932768] Tainted: G W 6.11.0pz+ #98
[ 246.938542] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.946445] task:kworker/u36:8 state:D stack:0 pid:1233 tgid:1233 ppid:2 flags:0x00004000
[ 246.946448] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.946450] Call Trace:
[ 246.946451] <TASK>
[ 246.946452] __schedule+0x3b4/0xac0
[ 246.946454] schedule+0x27/0xf0
[ 246.946455] schedule_timeout+0x12f/0x160
[ 246.946456] dma_fence_default_wait+0x1d8/0x250
[ 246.946458] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.946459] dma_fence_wait_timeout+0x108/0x140
[ 246.946461] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.946462] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.946463] process_one_work+0x17b/0x330
[ 246.946464] worker_thread+0x2ce/0x3f0
[ 246.946465] ? __pfx_worker_thread+0x10/0x10
[ 246.946466] kthread+0xcf/0x100
[ 246.946467] ? __pfx_kthread+0x10/0x10
[ 246.946467] ret_from_fork+0x31/0x50
[ 246.946468] ? __pfx_kthread+0x10/0x10
[ 246.946469] ret_from_fork_asm+0x1a/0x30
[ 246.946471] </TASK>
[ 246.946473] INFO: task kworker/u36:9:1234 blocked for more than 123 seconds.
[ 246.953609] Tainted: G W 6.11.0pz+ #98
[ 246.959325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.967227] task:kworker/u36:9 state:D stack:0 pid:1234 tgid:1234 ppid:2 flags:0x00004000
[ 246.967229] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.967231] Call Trace:
[ 246.967231] <TASK>
[ 246.967232] __schedule+0x3b4/0xac0
[ 246.967233] schedule+0x27/0xf0
[ 246.967235] schedule_timeout+0x12f/0x160
[ 246.967236] dma_fence_default_wait+0x1d8/0x250
[ 246.967238] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.967239] dma_fence_wait_timeout+0x108/0x140
[ 246.967240] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.967241] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.967243] process_one_work+0x17b/0x330
[ 246.967244] worker_thread+0x2ce/0x3f0
[ 246.967244] ? __pfx_worker_thread+0x10/0x10
[ 246.967245] kthread+0xcf/0x100
[ 246.967246] ? __pfx_kthread+0x10/0x10
[ 246.967247] ret_from_fork+0x31/0x50
[ 246.967247] ? __pfx_kthread+0x10/0x10
[ 246.967248] ret_from_fork_asm+0x1a/0x30
[ 246.967250] </TASK>
[ 246.967251] INFO: task kworker/u36:10:1235 blocked for more than 123 seconds.
[ 246.974476] Tainted: G W 6.11.0pz+ #98
[ 246.980191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.988131] task:kworker/u36:10 state:D stack:0 pid:1235 tgid:1235 ppid:2 flags:0x00004000
[ 246.988133] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 246.988134] Call Trace:
[ 246.988135] <TASK>
[ 246.988136] __schedule+0x3b4/0xac0
[ 246.988137] schedule+0x27/0xf0
[ 246.988139] schedule_timeout+0x12f/0x160
[ 246.988140] dma_fence_default_wait+0x1d8/0x250
[ 246.988141] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 246.988142] dma_fence_wait_timeout+0x108/0x140
[ 246.988144] dma_resv_wait_timeout+0xcc/0x1c0
[ 246.988145] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 246.988146] process_one_work+0x17b/0x330
[ 246.988147] worker_thread+0x2ce/0x3f0
[ 246.988148] ? __pfx_worker_thread+0x10/0x10
[ 246.988148] kthread+0xcf/0x100
[ 246.988149] ? __pfx_kthread+0x10/0x10
[ 246.988150] ret_from_fork+0x31/0x50
[ 246.988151] ? __pfx_kthread+0x10/0x10
[ 246.988152] ret_from_fork_asm+0x1a/0x30
[ 246.988153] </TASK>
[ 246.988157] INFO: task kworker/u36:11:1236 blocked for more than 123 seconds.
[ 246.995405] Tainted: G W 6.11.0pz+ #98
[ 247.001143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 247.009042] task:kworker/u36:11 state:D stack:0 pid:1236 tgid:1236 ppid:2 flags:0x00004000
[ 247.009044] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 247.009046] Call Trace:
[ 247.009047] <TASK>
[ 247.009047] __schedule+0x3b4/0xac0
[ 247.009049] schedule+0x27/0xf0
[ 247.009050] schedule_timeout+0x12f/0x160
[ 247.009052] dma_fence_default_wait+0x1d8/0x250
[ 247.009053] ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[ 247.009054] dma_fence_wait_timeout+0x108/0x140
[ 247.009056] dma_resv_wait_timeout+0xcc/0x1c0
[ 247.009057] ttm_bo_delayed_delete+0x2a/0x80 [ttm]
[ 247.009058] process_one_work+0x17b/0x330
[ 247.009059] worker_thread+0x2ce/0x3f0
[ 247.009060] ? __pfx_worker_thread+0x10/0x10
[ 247.009060] kthread+0xcf/0x100
[ 247.009061] ? __pfx_kthread+0x10/0x10
[ 247.009062] ret_from_fork+0x31/0x50
[ 247.009063] ? __pfx_kthread+0x10/0x10
[ 247.009064] ret_from_fork_asm+0x1a/0x30
[ 247.009065] </TASK>
[ 247.009066] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[ 250.616923] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=4294967169, lrc_seqno=4294967169, guc_id=78, flags=0x0 in deqp-vk [2048]
[ 250.617241] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=4294967264, lrc_seqno=4294967264, guc_id=69, flags=0x4 in no process [-1]
[ 250.618588] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=4294967265, lrc_seqno=4294967265, guc_id=69, flags=0x4 in no process [-1]
[ 250.618762] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10336, lrc_seqno=10336, guc_id=63, flags=0x4 in no process [-1]
[ 250.619034] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=328, lrc_seqno=328, guc_id=71, flags=0x4 in no process [-1]
[ 250.659478] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10974, lrc_seqno=10974, guc_id=13, flags=0x4 in no process [-1]
[ 250.659727] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10975, lrc_seqno=10975, guc_id=13, flags=0x4 in no process [-1]
[ 250.659741] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10976, lrc_seqno=10976, guc_id=13, flags=0x4 in no process [-1]
[ 250.659750] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10977, lrc_seqno=10977, guc_id=13, flags=0x4 in no process [-1]
[ 250.659757] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10978, lrc_seqno=10978, guc_id=13, flags=0x4 in no process [-1]
[ 250.659763] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10979, lrc_seqno=10979, guc_id=13, flags=0x4 in no process [-1]
[ 250.659769] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=10980, lrc_seqno=10980, guc_id=13, flags=0x4 in no process [-1]
[ 315.794509] xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=rcs, logical_mask: 0x1, guc_id=53
[ 315.794726] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=4294967170, lrc_seqno=4294967170, guc_id=53, flags=0x0 in deqp-vk [4571]
[ 315.795190] xe 0000:00:02.0: [drm] Xe device coredump has been created
[ 315.795192] xe 0000:00:02.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
[ 369.240445] xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=rcs, logical_mask: 0x1, guc_id=80
[ 369.240540] xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=4294967170, lrc_seqno=4294967170, guc_id=80, flags=0x0 in deqp-vk [6487]
How to reproduce the issue:
- install VK-GL-CTS:
git clone https://github.com/KhronosGroup/VK-GL-CTS.git
cd VK-GL-CTS
python3 external/fetch_sources.py
mkdir build
cd build
cmake -G Ninja -S .. -B .
ninja deqp-vk
cd ..
cd build/external/vulkancts/modules/vulkan
- install
https://gitlab.freedesktop.org/mesa/deqp-runner
cargo install deqp-runner
- Adapt my
test-sparse
script so it points to where you downloaded each thing, then run it:
przanoni@przanoni-mobl:~$ cat bin/test-sparse
#!/bin/bash
deqp_runner="$HOME/.cargo/bin/deqp-runner"
results_dir="$HOME/deqp-runner-results"
caselist_file="$results_dir/sparse-tests.txt"
vkglcts_dir="$HOME/git/VK-GL-CTS"
test_files_dir="$vkglcts_dir/external/vulkancts/mustpass/main/vk-default"
deqp_vk="$vkglcts_dir/build/external/vulkancts/modules/vulkan/deqp-vk"
mkdir -p $results_dir
if [ ! -x "$deqp_runner" ]; then
echo "deqp-runner not found"
echo "please run \"cargo install deqp-runner\""
exit 1
fi
if [ ! -f "$caselist_file" ]; then
echo "creating caselist file"
grep -h sparse $test_files_dir/*.txt > $caselist_file
fi
time wm $deqp_runner run \
--deqp $deqp_vk \
--caselist $caselist_file \
--timeout 180 \
--output $results_dir/run-$(date +%Y-%m-%d-%H%M)