mesa/st: Turn on OptimizeForAOS on non-scalar NIR backends.

On i965 vec4 hardware (most of crocus), this lets the VS matrix multiplies
happen in parallel as independent DP4s to each dest channel, rather than a
serialized set of MADs with approximately the same instruction count.
Should be a perf regression fix from the crocus transition (from the
original commit, "Improves performance in Lightsmark by 1.01131% +/-
0.162069% (n = 10) on a Haswell GT2 system.").

i915g:
total instructions in shared programs: 396828 -> 396831 (<.01%)
instructions in affected programs: 159 -> 162 (1.89%)

r300:
total instructions in shared programs: 1226783 -> 1228308 (0.12%)
instructions in affected programs: 61920 -> 63445 (2.46%)
total temps in shared programs: 195902 -> 195850 (-0.03%)
temps in affected programs: 2393 -> 2341 (-2.17%)

hsw:
total instructions in shared programs: 8163635 -> 8154150 (-0.12%)
instructions in affected programs: 174076 -> 164591 (-5.45%)

Reviewed-by: Dave Airlie <airlied@redhat.com>
194 jobs for !14277 with optimize-for-aos in 33 minutes and 51 seconds (queued for 16 seconds)
merge request