Performance slowdown by ~20% in Unigine sanctuary with nir_to_tgsi
@lorn10 complained in #5765 (closed) that the Sanctuary benchmark https://benchmark.unigine.com/sanctuary is slow and indeed there is a difference between the nir_to_tgsi and the old glsl path.
FPS:
- default nir_to_tgsi path: 8.8
- RADEON_DEBUG=use_tgsi: 10.9
both are averages from 3 runs, but it is very stable.
I dumped the shaders and here are the results with the baseline being the RADEON_DEBUG=use_tgsi and the new state being the default:
instructions in affected programs: 10295 -> 8840 (-14.13%)
helped: 143
HURT: 7
helped stats (abs) min: 1 max: 33 x̄: 10.30 x̃: 7
helped stats (rel) min: 2.33% max: 27.59% x̄: 14.45% x̃: 15.03%
HURT stats (abs) min: 2 max: 4 x̄: 2.57 x̃: 2
HURT stats (rel) min: 5.63% max: 9.09% x̄: 7.30% x̃: 7.69%
95% mean confidence interval for instructions value: -11.18 -8.22
95% mean confidence interval for instructions %-change: -14.55% -12.32%
Instructions are helped.
total vinst in shared programs: 2106 -> 1922 (-8.74%)
vinst in affected programs: 1833 -> 1649 (-10.04%)
helped: 58
HURT: 5
helped stats (abs) min: 1 max: 7 x̄: 3.33 x̃: 3
helped stats (rel) min: 2.78% max: 33.33% x̄: 14.41% x̃: 14.29%
HURT stats (abs) min: 1 max: 3 x̄: 1.80 x̃: 2
HURT stats (rel) min: 1.69% max: 42.86% x̄: 10.68% x̃: 3.57%
95% mean confidence interval for vinst value: -3.46 -2.38
95% mean confidence interval for vinst %-change: -15.01% -9.84%
Vinst are helped.
total sinst in shared programs: 1163 -> 945 (-18.74%)
sinst in affected programs: 1126 -> 908 (-19.36%)
helped: 66
HURT: 3
helped stats (abs) min: 1 max: 10 x̄: 3.35 x̃: 3
helped stats (rel) min: 5.00% max: 50.00% x̄: 22.60% x̃: 20.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 11.11% max: 11.11% x̄: 11.11% x̃: 11.11%
95% mean confidence interval for sinst value: -3.72 -2.60
95% mean confidence interval for sinst %-change: -24.11% -18.16%
Sinst are helped.
total flowcontrol in shared programs: 44 -> 44 (0.00%)
flowcontrol in affected programs: 44 -> 44 (0.00%)
helped: 2
HURT: 4
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 10.00% max: 10.00% x̄: 10.00% x̃: 10.00%
95% mean confidence interval for flowcontrol value: -1.63 1.63
95% mean confidence interval for flowcontrol %-change: -86.28% 32.95%
Inconclusive result (value mean confidence interval includes 0).
total tex in shared programs: 258 -> 258 (0.00%)
tex in affected programs: 0 -> 0
helped: 0
HURT: 0
total presub in shared programs: 126 -> 55 (-56.35%)
presub in affected programs: 108 -> 37 (-65.74%)
helped: 58
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.24 x̃: 1
helped stats (rel) min: 20.00% max: 100.00% x̄: 84.60% x̃: 100.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for presub value: -1.37 -1.04
95% mean confidence interval for presub %-change: -90.42% -75.91%
Presub are helped.
total omod in shared programs: 38 -> 0
omod in affected programs: 38 -> 0
helped: 26
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.46 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for omod value: -1.91 -1.02
95% mean confidence interval for omod %-change: -100.00% -100.00%
Omod are helped.
total temps in shared programs: 1280 -> 1368 (6.88%)
temps in affected programs: 909 -> 997 (9.68%)
helped: 52
HURT: 61
helped stats (abs) min: 1 max: 4 x̄: 1.37 x̃: 1
helped stats (rel) min: 9.09% max: 44.44% x̄: 19.78% x̃: 15.48%
HURT stats (abs) min: 1 max: 5 x̄: 2.61 x̃: 3
HURT stats (rel) min: 6.25% max: 100.00% x̄: 33.08% x̃: 27.27%
95% mean confidence interval for temps value: 0.35 1.21
95% mean confidence interval for temps %-change: 3.23% 14.27%
Temps are HURT.
total lits in shared programs: 70 -> 108 (54.29%)
lits in affected programs: 34 -> 72 (111.76%)
helped: 0
HURT: 26
HURT stats (abs) min: 1 max: 4 x̄: 1.46 x̃: 1
HURT stats (rel) min: 33.33% max: 400.00% x̄: 135.90% x̃: 100.00%
95% mean confidence interval for lits value: 1.02 1.91
95% mean confidence interval for lits %-change: 88.50% 183.30%
Lits are HURT.
LOST: 0
GAINED: 0
The strange thing is that we have less instructions but also less fps? I was suspecting the increased temps but actually the temps increase was much worse (almost 20% as opposed to the 9.6% from above) before 558a6006 however the performance before 558a6006 and after it is pretty much the same (could be 0.1 fps difference, but dunno). CC @anholt
This is with windowed 1024x768 mode and shader quality set to low as the high shaders have some small rendering issues with the RADEON_DEBUG=use_tgsi so I wanted to be sure this is apples to apples comparison:
System information
- OS: Debian GNU/Linux 11 (bullseye)
- GPU: RV530
- Kernel version: 5.10.0-10-686 #1 (closed) SMP Debian 5.10.84-1 (2021-12-08) i686 GNU/Linux
- Mesa version: 9cb91010
- Xserver version: X.Org X Server 1.20.11
- Desktop manager and compositor: LXDE