Skip to content

nir: Reorder f2f16(bcsel(x, .., ..))

Alyssa Rosenzweig requested to merge alyssa/mesa:nir/reorder-bcsel-conv into main

This removes all conversions from f2f16(bcsel(x, 0.0, f2f32(y))) which I hit. It's a nice win on Android shaders on AGX, at least.

   total instructions in shared programs: 1776725 -> 1773295 (-0.19%)
   instructions in affected programs: 166080 -> 162650 (-2.07%)
   helped: 590
   HURT: 6
   Instructions are helped.

   total bytes in shared programs: 11716524 -> 11695360 (-0.18%)
   bytes in affected programs: 1176948 -> 1155784 (-1.80%)
   helped: 612
   HURT: 9
   Bytes are helped.

   total halfregs in shared programs: 531032 -> 530844 (-0.04%)
   halfregs in affected programs: 1686 -> 1498 (-11.15%)
   helped: 43
   HURT: 3
   Halfregs are helped.

Like AGX, ir3 knows how to fold conversions, although the results there are all over the place. I think this looks like a small win overall but there's a lot of Inconclusive here.

   total instructions in shared programs: 3117278 -> 3113284 (-0.13%)
   instructions in affected programs: 181511 -> 177517 (-2.20%)
   helped: 419
   HURT: 115
   Instructions are helped.

   total nops in shared programs: 674281 -> 672865 (-0.21%)
   nops in affected programs: 43853 -> 42437 (-3.23%)
   helped: 326
   HURT: 194
   Nops are helped.

   total non-nops in shared programs: 2442997 -> 2440419 (-0.11%)
   non-nops in affected programs: 120787 -> 118209 (-2.13%)
   helped: 384
   HURT: 65
   Non-nops are helped.

   total mov in shared programs: 78630 -> 78617 (-0.02%)
   mov in affected programs: 2459 -> 2446 (-0.53%)
   helped: 69
   HURT: 70
   Inconclusive result (value mean confidence interval includes 0).

   total cov in shared programs: 75347 -> 72678 (-3.54%)
   cov in affected programs: 11944 -> 9275 (-22.35%)
   helped: 385
   HURT: 33
   Cov are helped.

   total dwords in shared programs: 6734360 -> 6734314 (<.01%)
   dwords in affected programs: 224888 -> 224842 (-0.02%)
   helped: 173
   HURT: 109
   Inconclusive result (value mean confidence interval includes 0).

   total last-baryf in shared programs: 118389 -> 118386 (<.01%)
   last-baryf in affected programs: 678 -> 675 (-0.44%)
   helped: 12
   HURT: 9
   Inconclusive result (value mean confidence interval includes 0).

   total full in shared programs: 193607 -> 193562 (-0.02%)
   full in affected programs: 473 -> 428 (-9.51%)
   helped: 41
   HURT: 20
   Full are helped.

   total constlen in shared programs: 494256 -> 494452 (0.04%)
   constlen in affected programs: 3520 -> 3716 (5.57%)
   helped: 0
   HURT: 49
   Constlen are HURT.

   total cat0 in shared programs: 743772 -> 742357 (-0.19%)
   cat0 in affected programs: 47059 -> 45644 (-3.01%)
   helped: 326
   HURT: 194
   Cat0 are helped.

   total cat1 in shared programs: 154137 -> 151471 (-1.73%)
   cat1 in affected programs: 16492 -> 13826 (-16.17%)
   helped: 383
   HURT: 61
   Cat1 are helped.

   total cat2 in shared programs: 1147730 -> 1147821 (<.01%)
   cat2 in affected programs: 19780 -> 19871 (0.46%)
   helped: 46
   HURT: 62
   Inconclusive result (%-change mean confidence interval includes 0).

   total cat3 in shared programs: 944660 -> 944656 (<.01%)
   cat3 in affected programs: 69 -> 65 (-5.80%)
   helped: 3
   HURT: 0

   total sstall in shared programs: 237980 -> 237807 (-0.07%)
   sstall in affected programs: 10800 -> 10627 (-1.60%)
   helped: 138
   HURT: 149
   Inconclusive result (value mean confidence interval includes 0).

   total (ss) in shared programs: 58074 -> 58043 (-0.05%)
   (ss) in affected programs: 1886 -> 1855 (-1.64%)
   helped: 92
   HURT: 76
   Inconclusive result (value mean confidence interval includes 0).

   total systall in shared programs: 504493 -> 504998 (0.10%)
   systall in affected programs: 19326 -> 19831 (2.61%)
   helped: 121
   HURT: 161
   Inconclusive result (value mean confidence interval includes 0).

   total (sy) in shared programs: 27389 -> 27393 (0.01%)
   (sy) in affected programs: 131 -> 135 (3.05%)
   helped: 17
   HURT: 17
   Inconclusive result (value mean confidence interval includes 0).

   total waves in shared programs: 440360 -> 440362 (<.01%)
   waves in affected programs: 440 -> 442 (0.45%)
   helped: 22
   HURT: 19
   Inconclusive result (value mean confidence interval includes 0).

Results on Mali-G57 are not as good (since panfrost doesn't know how to fold f2f16 conversions into destinations, although IIRC the hardware can do it). Still looks like a win overall thanks to register pressure decreasing:

   total instructions in shared programs: 2695860 -> 2691035 (-0.18%)
   instructions in affected programs: 517574 -> 512749 (-0.93%)
   helped: 810
   HURT: 287
   Instructions are helped.

   total cycles in shared programs: 141283.03 -> 141286.30 (<.01%)
   cycles in affected programs: 576.19 -> 579.45 (0.57%)
   helped: 52
   HURT: 124
   Inconclusive result (value mean confidence interval includes 0).

   total fma in shared programs: 22136.34 -> 22139.73 (0.02%)
   fma in affected programs: 426.44 -> 429.83 (0.80%)
   helped: 34
   HURT: 106
   Inconclusive result (%-change mean confidence interval includes 0).

   total cvt in shared programs: 14694.14 -> 14615.30 (-0.54%)
   cvt in affected programs: 4193.95 -> 4115.11 (-1.88%)
   helped: 826
   HURT: 281
   Cvt are helped.

   total sfu in shared programs: 8292.94 -> 8293.19 (<.01%)
   sfu in affected programs: 5.50 -> 5.75 (4.55%)
   helped: 2
   HURT: 2
   Inconclusive result (value mean confidence interval includes 0).

   total quadwords in shared programs: 1459944 -> 1457440 (-0.17%)
   quadwords in affected programs: 123904 -> 121400 (-2.02%)
   helped: 382
   HURT: 77
   Quadwords are helped.

   total threads in shared programs: 53606 -> 53611 (<.01%)
   threads in affected programs: 8 -> 13 (62.50%)
   helped: 6
   HURT: 1
   Threads are helped.

Finally, Mali-T860 (a vector processor) is also a wash:

   total instructions in shared programs: 1512521 -> 1511913 (-0.04%)
   instructions in affected programs: 250360 -> 249752 (-0.24%)
   helped: 505
   HURT: 270
   Instructions are helped.

   total bundles in shared programs: 644827 -> 644610 (-0.03%)
   bundles in affected programs: 86000 -> 85783 (-0.25%)
   helped: 465
   HURT: 231
   Inconclusive result (%-change mean confidence interval includes 0).

   total quadwords in shared programs: 1128965 -> 1128389 (-0.05%)
   quadwords in affected programs: 180606 -> 180030 (-0.32%)
   helped: 541
   HURT: 224
   Quadwords are helped.

   total registers in shared programs: 90718 -> 90682 (-0.04%)
   registers in affected programs: 962 -> 926 (-3.74%)
   helped: 68
   HURT: 43
   Inconclusive result (%-change mean confidence interval includes 0).

   total threads in shared programs: 55657 -> 55679 (0.04%)
   threads in affected programs: 37 -> 59 (59.46%)
   helped: 21
   HURT: 5
   Threads are helped.

   total spills in shared programs: 1434 -> 1435 (0.07%)
   spills in affected programs: 1 -> 2 (100.00%)
   helped: 0
   HURT: 1

   total fills in shared programs: 5222 -> 5236 (0.27%)
   fills in affected programs: 4 -> 18 (350.00%)
   helped: 0
   HURT: 1

Merge request reports