NIR: Do better for flrp on platforms that lack flrp instruction
This is basically a resend of an earlier series that I sent to the mailing list.
Rather than trying to do everything with only local information in nir_opt_algebraic, this series adds a new optimization pass. This new pass looks at how various parameters of a nir_op_flrp are used in other nir_op_flrp instructions to make better choices.
The results across the whole series are shown below. At least for Intel GPUs, this series is a significant improvement. There are a couple extra loops and a couple lost SIMD16 shaders on Iron Lake, but the trade off is an extra 52 (+47 overall) SIMD16 shaders on both Iron Lake and GM45. NOTE: On GM45 each shader is either SIMD8 or SIMD16, so the lost and gained counts will always be the same. I had to look at the actual output from report.py
to get the real count.
It also shouldn't break the whole universe for non-Intel GPUs.
I have another series waiting to go out that improves LRP and FMA generation for all of the Intel platforms that support LRP and FMA. That series caused a bunch of regressions on the non-LRP Intel platforms, so this series needs to land first.
Iron Lake
total instructions in shared programs: 8200374 -> 8099710 (-1.23%)
instructions in affected programs: 4586432 -> 4485768 (-2.19%)
helped: 20118
HURT: 1064
helped stats (abs) min: 1 max: 155 x̄: 5.07 x̃: 3
helped stats (rel) min: 0.11% max: 86.96% x̄: 2.83% x̃: 1.89%
HURT stats (abs) min: 1 max: 8 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.08% max: 6.52% x̄: 1.17% x̃: 0.97%
95% mean confidence interval for instructions value: -4.83 -4.67
95% mean confidence interval for instructions %-change: -2.67% -2.58%
Instructions are helped.
total cycles in shared programs: 187531554 -> 187056200 (-0.25%)
cycles in affected programs: 104021478 -> 103546124 (-0.46%)
helped: 19884
HURT: 1517
helped stats (abs) min: 2 max: 930 x̄: 24.60 x̃: 12
helped stats (rel) min: <.01% max: 94.58% x̄: 1.22% x̃: 0.56%
HURT stats (abs) min: 2 max: 208 x̄: 9.14 x̃: 6
HURT stats (rel) min: <.01% max: 4.11% x̄: 0.34% x̃: 0.15%
95% mean confidence interval for cycles value: -22.67 -21.76
95% mean confidence interval for cycles %-change: -1.15% -1.08%
Cycles are helped.
total loops in shared programs: 856 -> 860 (0.47%)
loops in affected programs: 0 -> 4
helped: 0
HURT: 4
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for loops value: 1.00 1.00
95% mean confidence interval for loops %-change: 0.00% 0.00%
Loops are HURT.
LOST: 5
GAINED: 52
GM45
total instructions in shared programs: 5004357 -> 4951782 (-1.05%)
instructions in affected programs: 2440172 -> 2387597 (-2.15%)
helped: 10341
HURT: 537
helped stats (abs) min: 1 max: 155 x̄: 5.15 x̃: 3
helped stats (rel) min: 0.11% max: 86.96% x̄: 2.78% x̃: 1.85%
HURT stats (abs) min: 1 max: 8 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.08% max: 6.25% x̄: 1.15% x̃: 0.96%
95% mean confidence interval for instructions value: -4.94 -4.72
95% mean confidence interval for instructions %-change: -2.65% -2.52%
Instructions are helped.
total cycles in shared programs: 127487388 -> 127192972 (-0.23%)
cycles in affected programs: 63823238 -> 63528822 (-0.46%)
helped: 10260
HURT: 825
helped stats (abs) min: 2 max: 930 x̄: 29.50 x̃: 12
helped stats (rel) min: <.01% max: 93.33% x̄: 1.30% x̃: 0.59%
HURT stats (abs) min: 2 max: 206 x̄: 10.03 x̃: 6
HURT stats (rel) min: <.01% max: 4.11% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -27.29 -25.83
95% mean confidence interval for cycles %-change: -1.22% -1.13%
Cycles are helped.
total loops in shared programs: 635 -> 637 (0.31%)
loops in affected programs: 0 -> 2
helped: 0
HURT: 2
total fills in shared programs: 93 -> 94 (1.08%)
fills in affected programs: 81 -> 82 (1.23%)
helped: 0
HURT: 1
LOST: 57
GAINED: 57