WIP: mesa/st: Turn on OptimizeForAOS on non-scalar NIR backends.
Based on !14200 (merged), alternative to !14247 (closed). I think we should land this instead.
On i965 vec4 hardware (most of crocus), this lets the VS matrix multiplies
happen in parallel as independent DP4s to each dest channel, rather than a
serialized set of MADs with approximately the same instruction count.
Should be a perf regression fix from the crocus transition (from the
original commit, "Improves performance in Lightsmark by 1.01131% +/-
0.162069% (n = 10) on a Haswell GT2 system.").
i915g:
total instructions in shared programs: 396828 -> 396831 (<.01%)
instructions in affected programs: 159 -> 162 (1.89%)
r300:
total instructions in shared programs: 1226783 -> 1228308 (0.12%)
instructions in affected programs: 61920 -> 63445 (2.46%)
total temps in shared programs: 195902 -> 195850 (-0.03%)
temps in affected programs: 2393 -> 2341 (-2.17%)
hsw:
total instructions in shared programs: 8163635 -> 8154150 (-0.12%)
instructions in affected programs: 174076 -> 164591 (-5.45%)
Edited by Emma Anholt