zink: ctx.resident_defs is read without being set first
After enabling MSAA on Sparse for Anv (MR !27306 (merged)) I'm getting some new Zink failures. Both KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup
and KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupResidency
are giving me the same problem: they crash but the crash message varies depending on what's read from uninitialized memory.
Sometimes I get:
Test case 'KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup'..
glcts: ../../src/gallium/drivers/zink/nir_to_spirv/nir_to_spirv.c:3145: emit_is_sparse_texels_resident: Assertion `ctx->resident_defs[index] != 0' failed.
But sometimes I get:
Test case 'KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup'..
MESA: error: ../../src/vulkan/runtime/vk_nir.c:60: SPIR-V offset 2348: SPIR-V parsing FAILED:
In file ../../src/compiler/spirv/vtn_private.h:730
SPIR-V id 32702 is out-of-bounds
2348 bytes into the SPIR-V binary
SPIR-V parsing FAILED:
In file ../../src/compiler/spirv/vtn_private.h:730
SPIR-V id 32702 is out-of-bounds
2348 bytes into the SPIR-V binary
Valgrind gives me:
==48424== Thread 14 glcts:zcfq1:
==48424== Conditional jump or move depends on uninitialised value(s)
==48424== at 0x6F83E15: emit_is_sparse_texels_resident (nir_to_spirv.c:3145)
==48424== by 0x6F84A72: emit_intrinsic (nir_to_spirv.c:3475)
==48424== by 0x6F871F7: emit_block (nir_to_spirv.c:4080)
==48424== by 0x6F877BB: emit_cf_list (nir_to_spirv.c:4194)
==48424== by 0x6F8753E: emit_if (nir_to_spirv.c:4143)
==48424== by 0x6F877DB: emit_cf_list (nir_to_spirv.c:4198)
==48424== by 0x6F89FA9: nir_to_spirv (nir_to_spirv.c:4811)
==48424== by 0x6E16C1A: compile_module (zink_compiler.c:3740)
==48424== by 0x6E19F47: zink_shader_compile (zink_compiler.c:3943)
==48424== by 0x6F64134: precompile_compute_job (zink_program.c:1342)
==48424== by 0x62D22C0: util_queue_thread_func (u_queue.c:309)
==48424== by 0x62F458F: impl_thrd_routine (threads_posix.c:67)
==48424== Uninitialised value was created by a heap allocation
==48424== at 0x4B12808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==48424== by 0x62CC9ED: ralloc_size (ralloc.c:118)
==48424== by 0x62CCD97: ralloc_array_size (ralloc.c:223)
==48424== by 0x6F89CE0: nir_to_spirv (nir_to_spirv.c:4777)
==48424== by 0x6E16C1A: compile_module (zink_compiler.c:3740)
==48424== by 0x6E19F47: zink_shader_compile (zink_compiler.c:3943)
==48424== by 0x6F64134: precompile_compute_job (zink_program.c:1342)
==48424== by 0x62D22C0: util_queue_thread_func (u_queue.c:309)
==48424== by 0x62F458F: impl_thrd_routine (threads_posix.c:67)
==48424== by 0x51753EB: start_thread (pthread_create.c:444)
==48424== by 0x51F596F: clone (clone.S:100)
Upon analyzing the code, I see that we create the ctx.resident_defs
array using ralloc_array_size
(instead of rzalloc_array_size
) and then we try to read the array in emit_is_sparse_texels_resident()
. We never seem to go through extract_sparse_load()
, which seems to be the only function that actually writes something to this array. Changing the operation to rzalloc_array_size
makes our error message consistent.
I can see that emit_tex()
is called for the index that crashes later (by printing tex->def.index
, it's always 22
for me), but tex->is_sparse
is false so we don't end up calling extract_sparse_load()
.
Simply removing the if (tex->is_sparse)
line so that we unconditionally call extract_sparse_load()
doesn't solve the problem: it leads to a message that says Type mismatch for SPIR-V value %68
.
If there's anything else I could try/print/investigate, please feel free to tell me.
Thanks,
Paulo