Skip to content

turnip+mesa: Lower mediump temps (and turnip CS shared mem) to 16-bit

Emma Anholt requested to merge anholt/mesa:mediump-vars into main

GLSL summmary:

mesa: Lower mediump temps and CS shared when the driver supports FP16+INT16.

Typically GLSL mediump lowering will have lowered all the ALU ops
generating the values to 16-bit, and once vars_to_ssa happens the mediump
temps disappear.  However, if they don't disappear (for example, the var
gets indirected and eventually gets lowered to scratch or indirect
lowering), then you don't want the storage upconverted to 32-bit.

Also, if a CS shared var is declared mediump, then storing it as 16 bit
prevents conversions around the load store assuming the ALU ops related to
them are 16 bit.  For gfxbench aztec ruins, the CS shared var sizes are
cut in half, improving overall perf by 0.805549% +/- 0.0953482% (n=6) on
gl-5-normal.

freedreno shader-db:
total instructions in shared programs: 2917577 -> 2917743 (<.01%)
instructions in affected programs: 46141 -> 46307 (0.36%)
total last-baryf in shared programs: 109712 -> 109492 (-0.20%)
last-baryf in affected programs: 638 -> 418 (-34.48%)
total full in shared programs: 190275 -> 190218 (-0.03%)
full in affected programs: 156 -> 99 (-36.54%)
total constlen in shared programs: 492596 -> 492600 (<.01%)
constlen in affected programs: 8 -> 12 (50.00%)

total cat6 in shared programs: 33019 -> 33107 (0.27%)
cat6 in affected programs: 3604 -> 3692 (2.44%)
total stp in shared programs: 3626 -> 3670 (1.21%)
stp in affected programs: 3336 -> 3380 (1.32%)
total ldp in shared programs: 1718 -> 1762 (2.56%)
ldp in affected programs: 1680 -> 1724 (2.62%)
(this is all in aztec ruins)

total sstall in shared programs: 195656 -> 195182 (-0.24%)
sstall in affected programs: 3249 -> 2775 (-14.59%)
total (ss) in shared programs: 52823 -> 52966 (0.27%)
(ss) in affected programs: 1733 -> 1876 (8.25%)
total systall in shared programs: 507928 -> 508687 (0.15%)
systall in affected programs: 103010 -> 103769 (0.74%)
total (sy) in shared programs: 23185 -> 23196 (0.05%)
(sy) in affected programs: 1276 -> 1287 (0.86%)
total waves in shared programs: 435290 -> 435302 (<.01%)
waves in affected programs: 12 -> 24 (100.00%)
total loops in shared programs: 407 -> 405 (-0.49%)
loops in affected programs: 9 -> 7 (-22.22%)

turnip summary:

In Aztec Ruins, we end up storing some big shared-mem arrays as 16-bit,
cutting shared mem size in half across many shaders while also reducing
conversions.  gfxbench vk-5-normal perf +0.364983% +/- 0.189764% (n=4).

fossil-db:
Totals from 448 (2.99% of 14988) affected shaders:
MaxWaves: 6154 -> 6390 (+3.83%); split: +3.96%, -0.13%
Instrs: 174554 -> 165045 (-5.45%); split: -6.45%, +1.01%
CodeSize: 364224 -> 345558 (-5.12%); split: -6.03%, +0.90%
NOPs: 48224 -> 48024 (-0.41%); split: -3.33%, +2.91%
MOVs: 6985 -> 6104 (-12.61%); split: -19.11%, +6.50%
Full: 4577 -> 4101 (-10.40%); split: -11.08%, +0.68%
(ss): 3428 -> 3335 (-2.71%); split: -4.17%, +1.46%
(sy): 1250 -> 1205 (-3.60%); split: -4.72%, +1.12%
(ss)-stall: 14695 -> 14528 (-1.14%); split: -2.25%, +1.12%
(sy)-stall: 19565 -> 17998 (-8.01%); split: -9.55%, +1.54%
STPs: 1086 -> 870 (-19.89%)
LDPs: 162 -> 108 (-33.33%)
Cat0: 51400 -> 51120 (-0.54%); split: -3.31%, +2.76%
Cat1: 16861 -> 14688 (-12.89%); split: -18.18%, +5.30%
Cat2: 71161 -> 68454 (-3.80%); split: -4.52%, +0.72%
Cat3: 29572 -> 25306 (-14.43%); split: -14.49%, +0.06%
Cat4: 3128 -> 3131 (+0.10%)
Cat5: 1502 -> 1506 (+0.27%)
Cat6: 840 -> 750 (-10.71%)

aztec ruins is a big winner with the ldp/stp reductions.  summoners_war
racks up an astounding 41% reduction in instructions and +15% max_waves.
Most affected apps show a minor win in instrs, with
fallout_shelter_online, and aztec ruins on ANGLE taking minor hits.
Edited by Emma Anholt

Merge request reports