nir/algebraic: Reassociate fadd into fmul in DP4-like pattern
This extends the optimization from commit 09705747 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") to a chain of 4 ffmas for a DP4-style pattern.
Moving the add to the other end of the sequence allows it to be fused into an FMA.
fossil-db results from Alchemist:
Totals:
Instrs: 158544142 -> 158490516 (-0.03%); split: -0.04%, +0.00%
Subgroup size: 7808912 -> 7808920 (+0.00%); split: +0.00%, -0.00%
Cycle count: 17859550672 -> 17859491966 (-0.00%); split: -0.01%, +0.01%
Spill count: 84652 -> 84494 (-0.19%); split: -0.37%, +0.18%
Fill count: 160728 -> 160623 (-0.07%); split: -0.29%, +0.23%
Scratch Memory Size: 4278272 -> 4272128 (-0.14%); split: -0.29%, +0.14%
Max live registers: 32411695 -> 32409789 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 5627856 -> 5627920 (+0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 185359099 -> 185307703 (-0.03%); split: -0.03%, +0.00%
Totals from 16378 (2.56% of 640872) affected shaders:
Instrs: 9818723 -> 9765097 (-0.55%); split: -0.58%, +0.04%
Subgroup size: 194056 -> 194064 (+0.00%); split: +0.01%, -0.01%
Cycle count: 294967108 -> 294908402 (-0.02%); split: -0.58%, +0.56%
Spill count: 10088 -> 9930 (-1.57%); split: -3.09%, +1.53%
Fill count: 24738 -> 24633 (-0.42%); split: -1.90%, +1.48%
Scratch Memory Size: 439296 -> 433152 (-1.40%); split: -2.80%, +1.40%
Max live registers: 1297204 -> 1295298 (-0.15%); split: -0.22%, +0.07%
Max dispatch width: 133232 -> 133296 (+0.05%); split: +0.14%, -0.10%
Non SSA regs after NIR: 11999084 -> 11947688 (-0.43%); split: -0.43%, +0.00%