broadcom/compiler: choose compile strategy with lowest spilling
Until now we would only allow spilling as a last resort in the last 2 strategies, however, it is possible that in some cases earlier strategies may produce less spills if we allowed spilling on them.
Likewise, the fallback scheduler can sometimes produce less spills than 2 threads with optimizations disabled.
With this change, we start allowing all our 2-thread strategies to spill, and instead of choosing the first strategy that is successful, we choose the one that doesn't spill or the one with the least amount of spilling.
It should be noted that this may incur in a significant increase of compile times. This is addressed with the last patch in this series which avoids rebuilding the interference graph with each spill.
Some shader-db numbers:
total instructions in shared programs: 13355795 -> 13320768 (-0.26%)
instructions in affected programs: 2395387 -> 2360360 (-1.46%)
helped: 3149
HURT: 4219
total threads in shared programs: 413882 -> 413826 (-0.01%)
threads in affected programs: 118 -> 62 (-47.46%)
helped: 1
HURT: 29
total uniforms in shared programs: 3756072 -> 3705509 (-1.35%)
uniforms in affected programs: 339272 -> 288709 (-14.90%)
helped: 2417
HURT: 114
total max-temps in shared programs: 2352861 -> 2365565 (0.54%)
max-temps in affected programs: 99326 -> 112030 (12.79%)
helped: 175
HURT: 2315
total spills in shared programs: 12310 -> 4241 (-65.55%)
spills in affected programs: 11677 -> 3608 (-69.10%)
helped: 196
HURT: 26
total fills in shared programs: 23625 -> 6115 (-74.12%)
fills in affected programs: 22701 -> 5191 (-77.13%)
helped: 209
HURT: 13
total sfu-stalls in shared programs: 32422 -> 34381 (6.04%)
sfu-stalls in affected programs: 5272 -> 7231 (37.16%)
helped: 961
HURT: 1076
total inst-and-stalls in shared programs: 13388217 -> 13355149 (-0.25%)
inst-and-stalls in affected programs: 2421564 -> 2388496 (-1.37%)
helped: 3162
HURT: 4256
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)
While we get better overall instruction counts, there are a few more hurt shaders than helped. This is mostly caused because we had a hack to batch-spill uniforms (which we added to reduce compile times at the expense of possibly worse spilling) which we now don't have any more (since we are now spilling fast). It seems in some cases, even causing worse spilling, this would lead to better QPU code! I think because more uniform spills mean shorter lifespans for the uniforms and this favors accumulators for these uniforms during RA which then favors QPU merges and thus smaller QPU programs, which is kind of an chain accident... I think we might want to use this information to revisit how we schedule our uniforms and constants to see if we can reduce their lifespans to gain this back. I think this is also the reason why we have a few shaders than now drop to 2 threads (as this will increase register pressure, as showed by the max-temps stat).
There is a quite massive reduction of spills, thanks to the fact that we now choose the best spilling strategy, and also a significant reduction on overall CPU time despite this, thanks to faster spills.