Mesa CI: NAVI10 hangs when running VKCTS on Linux 6.1
When I tried to upgrade from Linux 5.17 to Linux 6.1, I consistently hit the following error in vkcts-navi10-valve
:
Pass: 420020, Warn: 6, Skip: 703974, Duration: 13:26, Remaining: 5:10
Pass: 420762, Warn: 6, Skip: 705232, Duration: 13:28, Remaining: 5:09
Pass: 422106, Warn: 6, Skip: 707388, Duration: 13:31, Remaining: 5:06
Pass: 423433, Warn: 6, Skip: 709561, Duration: 13:33, Remaining: 5:04
[ 836.419044] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.423830] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.428547] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.433501] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.441080] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.446027] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.450874] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.455586] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.462915] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 836.467691] __vm_enough_memory: pid: 13324, comm: deqp-vk, no enough memory for the allocation
[ 846.438942] [drm:amdgpu_job_timedout] *ERROR* ring gfx_0.0.0 timeout, signaled seq=4437597, e+1811.840s: Matched the following patterns: session_reboot
mitted seq=4437599
[ 846.444141] [drm:amdgpu_job_timedout] *ERROR* Process information: process deqp-vk pid 10723 thread deqp-vk pid 10723
Source: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/33233495#L1785
Looking at the monitoring (1 Hz poll rate), it looks like RAM usgae is ~50%, while VRAM and GTT usage are also way under 50%. I tried enabling overcommit, but either it failed to apply or it did not help.
I'll need to investigate why this is happening, and get this situation fixed as we definitely want to be able to run on the latest Linux kernel!