pan/mdg: Use replicated dot products
The hardware dot products replicate the result to all components. Taking advantage of that behaviour lets us save piles of moves at the cost of increased register pressure. I'm not sure if this is a good tradeoff... but that's not my call to make anymore.
This came up while working on nir_lower_vec_to_regs (the new register intrinsics version of nir_lower_vec_to_moves) and trying to understand why Midgard's shader-db results were so different from Haswell.
total instructions in shared programs: 1518422 -> 1495574 (-1.50%)
instructions in affected programs: 723243 -> 700395 (-3.16%)
helped: 4550
HURT: 23
Instructions are helped.
total bundles in shared programs: 646941 -> 645253 (-0.26%)
bundles in affected programs: 179986 -> 178298 (-0.94%)
helped: 2049
HURT: 668
Bundles are helped.
total quadwords in shared programs: 1134727 -> 1122859 (-1.05%)
quadwords in affected programs: 511333 -> 499465 (-2.32%)
helped: 4189
HURT: 170
Quadwords are helped.
total registers in shared programs: 90619 -> 91017 (0.44%)
registers in affected programs: 11216 -> 11614 (3.55%)
helped: 576
HURT: 877
Registers are HURT.
total threads in shared programs: 55563 -> 55536 (-0.05%)
threads in affected programs: 822 -> 795 (-3.28%)
helped: 158
HURT: 239
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 1386 -> 1447 (4.40%)
spills in affected programs: 196 -> 257 (31.12%)
helped: 23
HURT: 15
total fills in shared programs: 5159 -> 5145 (-0.27%)
fills in affected programs: 1096 -> 1082 (-1.28%)
helped: 29
HURT: 13