doing mulitple vm_binds in the same ioctl makes dEQP test fail
Hello
In Mesa, when we need to do multiple vm_bind operations, we're currently issuing one bind per ioctl. Following is a piece of code from src/intel/vulkan/anv_sparse.c
:
/* FIXME: here we were supposed to issue a single vm_bind ioctl by calling
* vm_bind(device, num_binds, binds), but for an unknown reason some
* shader-related tests fail when we do that, so work around it for now.
*/
for (int b = 0; b < num_binds; b++) {
int rc = device->kmd_backend->vm_bind(device, 1, &binds[b]);
if (rc) {
ret = vk_error(device, VK_ERROR_OUT_OF_DEVICE_MEMORY);
break;
}
}
With the code above it works. But if we change it so it does a single ioctl with num_binds on it, tests start failing.
For example, dEQP test dEQP-VK.sparse_resources.shader_intrinsics.3d_sparse_fetch.rgba16ui.256_256_16
fail.
Here's the log when we run it:
przanoni@deegeetoo:~/git/VK-GL-CTS/build/external/vulkancts/modules/vulkan$ INTEL_DEBUG=sparse wm ./deqp-vk -n dEQP-VK.sparse_resources.shader_intrinsics.3d_sparse_fetch.rgba16ui.256_256_16
Writing test log into TestResults.qpa
dEQP Core git-a13cbc8559935c85201c975cbc2587e7dd5ea0f1 (0xa13cbc85) starting..
target implementation = 'Default'
Test case 'dEQP-VK.sparse_resources.shader_intrinsics.3d_sparse_fetch.rgba16ui.256_256_16'..
[ bind ] bo:---- res_offset:00000000 size:00c00000 mem_offset:00000000 addr:fffffffeff3d0000
miptail first_lod:4 size:65536 offset:11796480 stride:12582912
miptail first_lod:4 size:65536 offset:11796480 stride:12582912
[sparse submission, buffers:0 opaque_images:1 images:1 waits:0 signals:1]
[ bind ] bo:0014 res_offset:00b40000 size:00010000 mem_offset:00000000 addr:fffffffefff10000
=== [../../src/intel/vulkan/anv_sparse.c:618] [anv_sparse_bind_image_memory] BEGIN
--> mip_level:0 array_layer:0
aspect:0x1 plane:0
binding offset: [0, 0, 0] extent: [256, 256, 16]
[ bind ] bo:0012 res_offset:00000000 size:00080000 mem_offset:00000000 addr:fffffffeff3d0000
=== doing single ioctl
=== [../../src/intel/vulkan/anv_sparse.c:759] [anv_sparse_bind_image_memory] END num_binds:1
=== [../../src/intel/vulkan/anv_sparse.c:618] [anv_sparse_bind_image_memory] BEGIN
--> mip_level:2 array_layer:0
aspect:0x1 plane:0
binding offset: [0, 0, 0] extent: [64, 64, 4]
[ bind ] bo:0013 res_offset:00840000 size:00020000 mem_offset:00000000 addr:fffffffeffc10000
[ bind ] bo:0013 res_offset:008c0000 size:00020000 mem_offset:00020000 addr:fffffffeffc90000
[ bind ] bo:0013 res_offset:00940000 size:00020000 mem_offset:00040000 addr:fffffffeffd10000
[ bind ] bo:0013 res_offset:009c0000 size:00020000 mem_offset:00060000 addr:fffffffeffd90000
=== doing single ioctl
=== [../../src/intel/vulkan/anv_sparse.c:759] [anv_sparse_bind_image_memory] END num_binds:4
anv_free_sparse_bindings: address:0xfffffffeff3d0000 size:0x00c00000
[unbind] bo:---- res_offset:00000000 size:00c00000 mem_offset:00000000 addr:fffffffeff3d0000
Fail (Failed)
As you can see, the 4 binds that are done for bo 0013, if done on a single ioctl, result in the test failing.
I gathered the dmesg logs (fail1.txt and pass1.txt) and preprocessed them with:
#!/bin/bash
cat fail1.txt | cut -d']' -f2- > fail2.txt
cat fail2.txt | sed "s/pid=2164/pid=XXXX/" > fail3.txt
cat fail3.txt | sed "s/engine 00000000280e837c/engine XXXX/" > fail4.txt
cat fail4.txt | sed "s/engine 0000000099c02972/engine YYYY/" > fail5.txt
cat pass1.txt | cut -d']' -f2- > pass2.txt
cat pass2.txt | sed "s/pid=2258/pid=XXXX/" > pass3.txt
cat pass3.txt | sed "s/engine 000000003895979e/engine XXXX/" > pass4.txt
cat pass4.txt | sed "s/engine 0000000074512d06/engine YYYY/" > pass5.txt
diff -Nrup fail5.txt pass5.txt > diff.patch
Which allowed to produce a readable diff.patch, which should highlight how xe.ko is handling the different paths.
See the files attached:
I'd be happy to try debug Kernel patches and report the results.
I ran all this on DG2/Alchemist, but I can also reproduce the issue on TGL.
Thanks, Paulo