Skip to content

RFC: v3dv: fixing issues with const defined inside a loop, used outside later

So we have been trying to fix this CTS test: dEQP-VK.graphicsfuzz.cov-loop-condition-clamp-vec-of-ones

It is failing because on v3dv we don't support constant ubo indices. But that test is in fact using constant ubo indices. The problem is that after one of the executions of loop unroll those const become non-const. Specifically nir_to_lcssa moves them out as phis, and the const become somewhat lost on the pile of phis.

Initially we thought that the problem was that nir_opt_cse was moving the definition of the const inside one loop, when the definition is used later out of the loop. We created issue #6051 (closed), but quoting @cwabbott0 from that issue:

I don't think CSE should be working around quirks like that. In most other places we assume that load_const instructions can be placed freely anywhere with no cost and it's up to the backend to sort it out, because the backend is probably doing other optimizations/lowerings with immediates anyway.

So @tarceri suggested then to try to use the nir_to_lcssa pass used at nir_opt_loop_unroll but skipping invariants. I have a wip/hacky implementation in this branch:

https://gitlab.freedesktop.org/apinheiro/mesa/-/tree/wip/apinheiro/nir-csaa-loop-unrolling

In addition to be still dirty, and still have some regressions, just trying the lowering skipping invariants is not enough. As the loop is cloned, the invariants that are skipped got outdated, so we need to update their definition. For that my implementation uses the remap_table used while cloning the loop.

But then I realized that this is somewhat pointless, because as far as I understand, one of the purposes of nir_to_lcsaa is avoiding this kind of remapping post-cleanup.

I also realized that for anv, the final nir shader is still getting the constant ubo index (I initially assumed that this worked with anv as it supported non-constant ubo indices). So I started to check if the problem is that we were missing any lowering.

First, if we remove the calls to nir_opt_if or nir_opt_remove_phis or nir_opt_trivial_continues on the vulkan frontend, we got it working (we don't need to remove all of them, just one of them).

I thought to find the solution when we added a call to nir_opt_gcm on the backend. That is the first commit in this series.

But then I decided to do some cleanup on how we call the lowerings (as I found several stuff to improve while working on this), and found that the test regress if we add some other optimizations on the frontend. Second commit on this series is a "safe cleanup", without getting the test regress.

The last commit shows the regression I'm mentioning. That one just adds a call to loop unroll on the Vulkan frontend. This is done by several other Vulkan drivers. Right now on v3d/v3dv we do that on the backend as we had some compile strategies that could override it. Even if we don't plan to change that, I added it as a example.

So, we have a kind of solution, but seems like really fragile, as any change of how we call the optimizations could make it fail again. At this point I'm not sure if the problem is that we should follow a specific order on how those optimizations are called, or even if we should call some of the optimizations we are calling before/after loop unroll, or if we should add a new lowering.

Any feedback is welcome.

FWIW, this is the shaderdb stats (we can get them when compiling shaders on Vulkan too) of all the options that got that specific test fixed:

  • skip invariants on nir_to_lcssa:

SHADER-DB-fb82ee5229f061ee141c055c9ddcb1efecfdf75b - MESA_SHADER_FRAGMENT shader: 98 inst, 4 threads, 1 loops, 16 uniforms, 8 max-temps, 0:0 spills:fills, 1 sfu-stalls, 99 inst-and-stalls, 34 nops

  • Remove nir_opt_remove_phis call on the frontend:

SHADER-DB-fb82ee5229f061ee141c055c9ddcb1efecfdf75b - MESA_SHADER_FRAGMENT shader: 99 inst, 4 threads, 2 loops, 17 uniforms, 9 max-temps, 0:0 spills:fills, 1 sfu-stalls, 100 inst-and-stalls, 31 nops

  • Remove nir_opt_if call from the frontend:

SHADER-DB-fb82ee5229f061ee141c055c9ddcb1efecfdf75b - MESA_SHADER_FRAGMENT shader: 107 inst, 4 threads, 2 loops, 19 uniforms, 9 max-temps, 0:0 spills:fills, 1 sfu-stalls, 108 inst-and-stalls, 37 nops

  • Remove nir_opt_trivial_continues call from the frontend:

SHADER-DB-fb82ee5229f061ee141c055c9ddcb1efecfdf75b - MESA_SHADER_FRAGMENT shader: 109 inst, 4 threads, 2 loops, 19 uniforms, 9 max-temps, 0:0 spills:fills, 1 sfu-stalls, 110 inst-and-stalls, 34 nops

  • Add one call to nir_opt_gcm on the backend:

SHADER-DB-fb82ee5229f061ee141c055c9ddcb1efecfdf75b - MESA_SHADER_FRAGMENT shader: 74 inst, 4 threads, 1 loops, 12 uniforms, 9 max-temps, 0:0 spills:fills, 0 sfu-stalls, 74 inst-and-stalls, 17 nops

Edited by Alejandro Piñeiro

Merge request reports