v3d: use nir_opt_sink
This seems to help a lot with reducing register pressure leading to less spilling. As per the shader-db stats below, we gain 10 shaders from shader-db, and in the UE4 shooter demo the large compute shader used for histogram computations goes from 48/50 spills/fills down to 8/15.
total instructions in shared programs: 14072341 -> 14062334 (-0.07%)
instructions in affected programs: 1996685 -> 1986678 (-0.50%)
helped: 3038
HURT: 2432
Instructions are helped.
total uniforms in shared programs: 3797720 -> 3794523 (-0.08%)
uniforms in affected programs: 191711 -> 188514 (-1.67%)
helped: 831
HURT: 449
Uniforms are helped.
total max-temps in shared programs: 2340632 -> 2335124 (-0.24%)
max-temps in affected programs: 113632 -> 108124 (-4.85%)
helped: 2728
HURT: 436
Max-temps are helped.
total spills in shared programs: 6050 -> 5931 (-1.97%)
spills in affected programs: 2869 -> 2750 (-4.15%)
helped: 14
HURT: 4
total fills in shared programs: 13970 -> 13371 (-4.29%)
fills in affected programs: 8831 -> 8232 (-6.78%)
helped: 14
HURT: 4
total inst-and-stalls in shared programs: 14103668 -> 14093712 (-0.07%)
inst-and-stalls in affected programs: 2004035 -> 1994079 (-0.50%)
helped: 3009
HURT: 2426
Inst-and-stalls are helped.
LOST: 0
GAINED: 10
I thought we would get even better results by doing a nir_opt_move afterwards like we do for UBO loads, but that actually hurts. It may need more investigation but it seems that the VIR produced is actually better, but then the qpu scheduler ends up producing more instructions for some reason.