nir: Fix an out-of-ssa bug when both convergent and divergent uses of a value appear are caught in the same phi webs
In #8037 (closed), we tracked a GPU hang in Rise of the Tomb raider down to what appears to be a bug in NIR's out-of-SSA pass when handling two intertwined values with mixed convergence status (one has convergent storage, and the other has divergent storage). For example, where one is a simple loop counter and the other is a divergent value calculated based on that, but mixing in shader inputs or loads.
fossilize-replay --pipeline-hash 862bd592e1e72941 fossils/steam-native/rise_of_the_tomb_raider_g2.foz
with INTEL_DEBUG=gs demonstrates the issue. If you look at the shader dumps Illia posted in #8037 (closed), and dwdiff -c
the dumps with and without nir_divergence_analysis
, there's a clear difference: in this loop, out-of-SSA is creating an extra copy of the register (r28
) which is never used again. Additionally, the dead copy is the only one getting the incremented loop counter, so it becomes an infinite loop.
That shader is complicated. You can partly see the bug in how resolve_parallel_copies
handles this simple Piglit test:
loop {
block block_1:
/* preds: block_0 block_4 */
vec1 32 con ssa_7 = ilt32 ssa_4, r1
/* succs: block_2 block_3 */
if ssa_7 {
block block_2:
/* preds: block_1 */
break
/* succs: block_5 */
} else {
block block_3:
/* preds: block_1 */
/* succs: block_4 */
}
block block_4:
/* preds: block_3 */
con r2 = iadd r1, ssa_3
div r0 = mov r1
con r3 = mov r1
con r1 = mov r2
/* succs: block_1 */
}
Here, a new r3
copy of the convergent value is created and never used again. This is wrong. In this test, the extra value is harmless and the program works out anyway. It doesn't become an infinite loop like in the original Rise of the Tomb Raider shader—I haven't quite been able to reproduce non-termination yet in a simple test.
This MR has a proposed fix, which is to mark a value's predecessor as "ready for copying" if it needs to be filled and has a location, regardless of whether divergence matches. We can't assign it a location, but if it has one, it should be ready anyway. This seems to keep the shell game working as intended...
+@gfxstrand +@cwabbott0 +@daniel-schuermann (my sincere apologies for asking you to page this back in...)