anv: hw descriptor state mismatch in supertuxkart
Strap on your crash helmet, this one's painful.
I've found some kind of weird corner case where some of the hardware states related to descriptors (3DSTATE_CONSTANT_VS
, flush_descriptor_sets()
) get out of sync with the command buffer state or something. The method for reproducing is simple:
- clone the debug branch I've made for this ticket (https://gitlab.freedesktop.org/zmike/mesa/-/commits/watkart) and build zink+anv like usual
- run
MESA_LOADER_DRIVER_OVERRIDE=zink supertuxkart --track="ravenbridge_mansion" -R
- wait for craziness
What seems to be happening is that zink's caching of descriptor sets, combined with its weird outboard compute cmdbuf are screwing up the hardware states. The debug branch above is hacked to use up to 2 descriptor sets: one for samplers (only in GFX cmdbufs) and one for everything else (compute will always use one set in this branch). Compared to HEAD~2
, there are noticeable rendering regressions that aren't reproducible on RADV or lavapipe.
This is a tough problem, so I've added a bunch of debugging facilities to the branch to aid with testing, all in the form of environment variables which can be toggled to change runtime behavior:
-
ZINK_ONE_SET
forces zink to go back to using a single descriptor set at all times while using all the same codepaths; this restores previous (good) behavior, though it also doesn't actually do much reusing of descriptor sets -
ZINK_ALWAYS_UPDATE
effectively disables caching, forcingvkUpdateDescriptorSets
to be called even if no descriptors have changed; this has no effect other than to prove updating doesn't resolve the issue -
ZINK_NO_COMPUTE
disables compute extension support, forcing supertuxkart to use a different renderer; the good behavior is restored in this case, which proves that (a) the caching/reuse is not an issue (b) this is likely somehow triggered by the compute batch's existence
Furthermore, the branch will print to stdout all the binding values for sampler descriptors along with the descriptor set. Using diff
on the outputs will reveal that they are identical save for the descriptor set. Similarly, all shader output is identical with and without ZINK_ONE_SET
.
Red herrings:
- don't bother checking validation errors, there's a bunch of them but none are new in the commit triggering the issue (
HEAD~1
) or related to the issue - barriers seem good and I've tried jamming in tons of manual ones to verify just for hahas
- fencing is also fine, as this branch forces an explicit fence for every single scanout frame
Solutions I've found (not actual solutions, but ones which mitigate/resolve the issue):
- forcing
cmd_buffer->state.descriptors_dirty = VK_SHADER_STAGE_VERTEX_BIT|VK_SHADER_STAGE_FRAGMENT_BIT
on no-op graphics pipeline update (i.e.,old_pipeline == new_pipeline
) - forcing
cmd_buffer->state.descriptors_dirty = VK_SHADER_STAGE_VERTEX_BIT|VK_SHADER_STAGE_FRAGMENT_BIT
on descriptor set binding - forcing
VK_SHADER_STAGE_VERTEX_BIT
in this block fromgenX_cmd_buffer.c
mitigates the issue somewhat but doesn't fully resolve it:
/* We emit the binding tables and sampler tables first, then emit push
* constants and then finally emit binding table and sampler table
* pointers. It has to happen in this order, since emitting the binding
* tables may change the push constants (in case of storage images). After
* emitting push constants, on SKL+ we have to emit the corresponding
* 3DSTATE_BINDING_TABLE_POINTER_* for the push constants to take effect.
*/
uint32_t dirty = 0;
if (descriptors_dirty) {
dirty = flush_descriptor_sets(cmd_buffer,
&cmd_buffer->state.gfx.base,
descriptors_dirty,
pipeline->shaders,
ARRAY_SIZE(pipeline->shaders));
cmd_buffer->state.descriptors_dirty &= ~dirty;
/* wat */
dirty |= VK_SHADER_STAGE_VERTEX_BIT;
}
I think that's everything I know at this point. I'm on GEN11 in case that matters.