ir3: improve handling of predicate registers
This series improves the handling of predicate registers in ir3 by doing a few things:
- Make use if all four predicate registers available since a6xx by adding a register allocator for them. Up to now, only a single register (
p0.x
) was used. The new RA also works for older gens with only one predicate register. - Add codegen for the new
braa
/brao
instructions available since a6xx by folding and/or into them. - Folding negations into branches using the
inv1
/inv2
fields (all gens). - Make bitwise operations directly write to predicate registers which is possible since a6xx.
- Add CSE for
cmps
.
ir3 core changes
To make this all possible/easier to implement, some core changes to ir3 were made.
Instead of using brtype
and condition
fields in blocks, explicitly add terminator branches in their instruction lists. This makes it more uniform to deal with branches in passes. This is especially useful for passes that need to deal with predicate registers like the new register allocator.
We currently have a bit of a confusing situation where we have both opcodes for the different branches (OPC_BR
, OPC_BRAA
,...) and branch types which are supposed to be used with OPC_B
(BRANCH_PLAIN
, BRANCH_AND
,...). However, not every kind of branch has a corresponding type. For example, getone
is represented by OPC_GETONE
instead of a branch type.
This series proposes to get rid of the branch types and use opcodes everywhere. I think this makes the representation of branches more consistent. It also removes the for the encoder to translate branch types into opcodes.
Predicate register allocation
Because predicate registers can be "spilled" (see below) to GPRs, the predicate RA is implemented as a pass before regular RA.
The RA uses the standard liveness analysis available in ir3. Using this, registers are allocated in a single pass over all blocks. For each block we keep track of currently live defs in the registers. Predicate destinations allocate a new register and sources take the register from their def.
The live defs of a block are initialized with the intersection of the live-out defs of their predecessors: if all predecessors have the same live-out def in the same register, it is used as live-in. However, we only do this for defs that are actually live-in according to the liveness analysis.
This doesn't work for loops: since predecessors from back edges are processed after their successors, we don't know their live-out state yet. We solve this by ignoring such predecessors while calculating the live-in state. When this predecessor is later processed, we fix-up its live-out state to match what its successor expects by reloading defs if necessary.
Spilling is implemented by reloading, or rematerializing, the instruction that produced the def. Whenever we need a new register while none are available, we simply free one. If the freed def is later needed again, we clone the original instruction in front on the new use. We keep track of the original def the reload is cloned from so that subsequent uses can reuse the reload. Note that this essentially spills the predicate register to the GPR sources of the cloned instruction.
Optimize bitwise operations
When generating instructions that need a predicate source, ir3 will insert a cmps.s.ne 0
instruction to guarantee a predicate can be produced. We add a pass that removes thosecmps
s whenever their source is a bitwise operation that can directly write to a predicate register.
Folding and/or/not into branches
Fold and/or into braa
/brao
when profitable. Only do this when the and/or is not used for any non-branch instructions as this would increase total instruction count.
Add an algebraic nir pass that performs the inverse DeMorgan's laws to try to bring and/or in front of branches. Again, only do this when the original inot in only used for branches. This should always decrease instruction count since the extra inots can be folded into the branch.
Results
a540 shader-db including Rob's shaders
total instructions in shared programs: 3427644 -> 3419435 (-0.24%) instructions in affected programs: 467758 -> 459549 (-1.75%) helped: 958 HURT: 46 helped stats (abs) min: 1 max: 512 x̄: 8.74 x̃: 7 helped stats (rel) min: 0.15% max: 20.97% x̄: 2.83% x̃: 2.39% HURT stats (abs) min: 1 max: 18 x̄: 3.65 x̃: 1 HURT stats (rel) min: 0.11% max: 13.33% x̄: 1.22% x̃: 0.53% 95% mean confidence interval for instructions value: -9.24 -7.12 95% mean confidence interval for instructions %-change: -2.80% -2.50% Instructions are helped.total nops in shared programs: 601486 -> 595384 (-1.01%) nops in affected programs: 113378 -> 107276 (-5.38%) helped: 903 HURT: 129 helped stats (abs) min: 1 max: 384 x̄: 7.15 x̃: 6 helped stats (rel) min: 0.67% max: 30.00% x̄: 8.10% x̃: 8.57% HURT stats (abs) min: 1 max: 21 x̄: 2.72 x̃: 1 HURT stats (rel) min: 0.26% max: 81.82% x̄: 4.93% x̃: 3.25% 95% mean confidence interval for nops value: -6.70 -5.13 95% mean confidence interval for nops %-change: -6.91% -6.04% Nops are helped.
total non-nops in shared programs: 2826158 -> 2824051 (-0.07%) non-nops in affected programs: 355752 -> 353645 (-0.59%) helped: 1023 HURT: 0 helped stats (abs) min: 1 max: 128 x̄: 2.06 x̃: 1 helped stats (rel) min: 0.14% max: 19.15% x̄: 1.08% x̃: 0.60% 95% mean confidence interval for non-nops value: -2.32 -1.79 95% mean confidence interval for non-nops %-change: -1.15% -1.00% Non-nops are helped.
total mov in shared programs: 248303 -> 248225 (-0.03%) mov in affected programs: 532 -> 454 (-14.66%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 7 x̄: 4.16 x̃: 4 helped stats (rel) min: 1.89% max: 66.67% x̄: 21.38% x̃: 16.67% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% 95% mean confidence interval for mov value: -5.01 -2.79 95% mean confidence interval for mov %-change: -28.19% -9.94% Mov are helped.
total cov in shared programs: 101693 -> 101692 (<.01%) cov in affected programs: 26 -> 25 (-3.85%) helped: 1 HURT: 0
total dwords in shared programs: 6877604 -> 6871258 (-0.09%) dwords in affected programs: 383250 -> 376904 (-1.66%) helped: 414 HURT: 10 helped stats (abs) min: 2 max: 512 x̄: 15.59 x̃: 6 helped stats (rel) min: 0.13% max: 25.00% x̄: 3.31% x̃: 1.46% HURT stats (abs) min: 2 max: 32 x̄: 11.00 x̃: 4 HURT stats (rel) min: 0.16% max: 4.92% x̄: 1.62% x̃: 0.94% 95% mean confidence interval for dwords value: -17.59 -12.35 95% mean confidence interval for dwords %-change: -3.63% -2.76% Dwords are helped.
total last-baryf in shared programs: 97405 -> 97398 (<.01%) last-baryf in affected programs: 384 -> 377 (-1.82%) helped: 4 HURT: 4 helped stats (abs) min: 1 max: 22 x̄: 7.25 x̃: 3 helped stats (rel) min: 1.67% max: 15.71% x̄: 11.16% x̃: 13.64% HURT stats (abs) min: 1 max: 10 x̄: 5.50 x̃: 5 HURT stats (rel) min: 3.03% max: 27.03% x̄: 15.03% x̃: 15.03% 95% mean confidence interval for last-baryf value: -9.23 7.48 95% mean confidence interval for last-baryf %-change: -12.45% 16.31% Inconclusive result (value mean confidence interval includes 0).
total last-helper in shared programs: 1372136 -> 1368491 (-0.27%) last-helper in affected programs: 159566 -> 155921 (-2.28%) helped: 305 HURT: 39 helped stats (abs) min: 1 max: 512 x̄: 12.38 x̃: 8 helped stats (rel) min: 0.15% max: 21.31% x̄: 3.53% x̃: 2.79% HURT stats (abs) min: 1 max: 18 x̄: 3.36 x̃: 1 HURT stats (rel) min: 0.11% max: 13.46% x̄: 1.26% x̃: 0.40% 95% mean confidence interval for last-helper value: -13.67 -7.52 95% mean confidence interval for last-helper %-change: -3.31% -2.67% Last-helper are helped.
total half in shared programs: 62509 -> 62147 (-0.58%) half in affected programs: 826 -> 464 (-43.83%) helped: 345 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 1.05 x̃: 1 helped stats (rel) min: 4.35% max: 100.00% x̄: 75.17% x̃: 100.00% 95% mean confidence interval for half value: -1.14 -0.96 95% mean confidence interval for half %-change: -78.80% -71.55% Half are helped.
total full in shared programs: 173294 -> 173271 (-0.01%) full in affected programs: 89 -> 66 (-25.84%) helped: 6 HURT: 2 helped stats (abs) min: 3 max: 9 x̄: 4.50 x̃: 4 helped stats (rel) min: 25.00% max: 42.86% x̄: 32.14% x̃: 33.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for full value: -5.85 0.10 95% mean confidence interval for full %-change: -43.74% 20.53% Inconclusive result (value mean confidence interval includes 0).
total constlen in shared programs: 879612 -> 879612 (0.00%) constlen in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat0 in shared programs: 660130 -> 654028 (-0.92%) cat0 in affected programs: 120305 -> 114203 (-5.07%) helped: 903 HURT: 129 helped stats (abs) min: 1 max: 384 x̄: 7.15 x̃: 6 helped stats (rel) min: 0.65% max: 26.09% x̄: 7.44% x̃: 7.89% HURT stats (abs) min: 1 max: 21 x̄: 2.72 x̃: 1 HURT stats (rel) min: 0.25% max: 54.55% x̄: 4.03% x̃: 3.03% 95% mean confidence interval for cat0 value: -6.70 -5.13 95% mean confidence interval for cat0 %-change: -6.37% -5.65% Cat0 are helped.
total cat1 in shared programs: 355780 -> 355697 (-0.02%) cat1 in affected programs: 1188 -> 1105 (-6.99%) helped: 20 HURT: 1 helped stats (abs) min: 1 max: 7 x̄: 4.20 x̃: 4 helped stats (rel) min: 1.32% max: 66.67% x̄: 14.67% x̃: 7.14% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67% 95% mean confidence interval for cat1 value: -5.09 -2.81 95% mean confidence interval for cat1 %-change: -22.03% -4.32% Cat1 are helped.
total cat2 in shared programs: 1233203 -> 1231179 (-0.16%) cat2 in affected programs: 181622 -> 179598 (-1.11%) helped: 1010 HURT: 0 helped stats (abs) min: 1 max: 128 x̄: 2.00 x̃: 1 helped stats (rel) min: 0.25% max: 18.18% x̄: 2.37% x̃: 1.14% 95% mean confidence interval for cat2 value: -2.27 -1.74 95% mean confidence interval for cat2 %-change: -2.55% -2.19% Cat2 are helped.
total cat3 in shared programs: 1031422 -> 1031422 (0.00%) cat3 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat4 in shared programs: 75629 -> 75629 (0.00%) cat4 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat5 in shared programs: 47992 -> 47992 (0.00%) cat5 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat6 in shared programs: 22498 -> 22498 (0.00%) cat6 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat7 in shared programs: 990 -> 990 (0.00%) cat7 in affected programs: 0 -> 0 helped: 0 HURT: 0
total stp in shared programs: 2 -> 2 (0.00%) stp in affected programs: 0 -> 0 helped: 0 HURT: 0
total ldp in shared programs: 2 -> 2 (0.00%) ldp in affected programs: 0 -> 0 helped: 0 HURT: 0
total sstall in shared programs: 247837 -> 248330 (0.20%) sstall in affected programs: 6810 -> 7303 (7.24%) helped: 31 HURT: 122 helped stats (abs) min: 1 max: 10 x̄: 3.71 x̃: 1 helped stats (rel) min: 1.59% max: 17.86% x̄: 6.37% x̃: 5.26% HURT stats (abs) min: 1 max: 75 x̄: 4.98 x̃: 3 HURT stats (rel) min: 0.80% max: 115.38% x̄: 12.96% x̃: 11.94% 95% mean confidence interval for sstall value: 1.83 4.62 95% mean confidence interval for sstall %-change: 6.60% 11.48% Sstall are HURT.
total (ss) in shared programs: 67114 -> 67408 (0.44%) (ss) in affected programs: 2292 -> 2586 (12.83%) helped: 19 HURT: 57 helped stats (abs) min: 1 max: 2 x̄: 1.05 x̃: 1 helped stats (rel) min: 2.08% max: 33.33% x̄: 8.81% x̃: 7.14% HURT stats (abs) min: 1 max: 63 x̄: 5.51 x̃: 3 HURT stats (rel) min: 0.00% max: 55.00% x̄: 18.54% x̃: 18.18% 95% mean confidence interval for (ss) value: 1.99 5.75 95% mean confidence interval for (ss) %-change: 8.02% 15.38% (ss) are HURT.
total systall in shared programs: 718495 -> 718907 (0.06%) systall in affected programs: 14011 -> 14423 (2.94%) helped: 43 HURT: 114 helped stats (abs) min: 1 max: 29 x̄: 4.79 x̃: 2 helped stats (rel) min: 0.30% max: 100.00% x̄: 7.55% x̃: 2.78% HURT stats (abs) min: 1 max: 56 x̄: 5.42 x̃: 3 HURT stats (rel) min: 0.00% max: 700.00% x̄: 17.86% x̃: 2.90% 95% mean confidence interval for systall value: 1.27 3.97 95% mean confidence interval for systall %-change: 1.05% 20.75% Systall are HURT.
total (sy) in shared programs: 22961 -> 23226 (1.15%) (sy) in affected programs: 738 -> 1003 (35.91%) helped: 4 HURT: 42 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 14.29% max: 50.00% x̄: 32.14% x̃: 32.14% HURT stats (abs) min: 1 max: 63 x̄: 6.40 x̃: 4 HURT stats (rel) min: 6.67% max: 100.00% x̄: 35.84% x̃: 36.67% 95% mean confidence interval for (sy) value: 2.83 8.69 95% mean confidence interval for (sy) %-change: 21.76% 38.10% (sy) are HURT.
total waves in shared programs: 556366 -> 556370 (<.01%) waves in affected programs: 54 -> 58 (7.41%) helped: 6 HURT: 2 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 25.00% max: 100.00% x̄: 50.00% x̃: 50.00% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33% 95% mean confidence interval for waves value: -1.82 2.82 95% mean confidence interval for waves %-change: -8.44% 66.78% Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 456 -> 456 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0
a660 shader-db including Rob's shaders
total instructions in shared programs: 4188898 -> 4145010 (-1.05%) instructions in affected programs: 1379336 -> 1335448 (-3.18%) helped: 1776 HURT: 82 helped stats (abs) min: 1 max: 652 x̄: 24.96 x̃: 8 helped stats (rel) min: 0.04% max: 29.15% x̄: 4.29% x̃: 2.94% HURT stats (abs) min: 1 max: 46 x̄: 5.45 x̃: 2 HURT stats (rel) min: 0.06% max: 8.22% x̄: 1.30% x̃: 0.64% 95% mean confidence interval for instructions value: -25.81 -21.43 95% mean confidence interval for instructions %-change: -4.26% -3.83% Instructions are helped.total nops in shared programs: 947986 -> 914913 (-3.49%) nops in affected programs: 483546 -> 450473 (-6.84%) helped: 1712 HURT: 131 helped stats (abs) min: 1 max: 488 x̄: 19.73 x̃: 6 helped stats (rel) min: 0.18% max: 66.67% x̄: 9.84% x̃: 8.92% HURT stats (abs) min: 1 max: 58 x̄: 5.40 x̃: 2 HURT stats (rel) min: 0.27% max: 41.23% x̄: 6.99% x̃: 3.60% 95% mean confidence interval for nops value: -19.62 -16.27 95% mean confidence interval for nops %-change: -9.06% -8.22% Nops are helped.
total non-nops in shared programs: 3240912 -> 3230097 (-0.33%) non-nops in affected programs: 882671 -> 871856 (-1.23%) helped: 1783 HURT: 19 helped stats (abs) min: 1 max: 164 x̄: 6.13 x̃: 2 helped stats (rel) min: 0.09% max: 16.23% x̄: 1.87% x̃: 0.83% HURT stats (abs) min: 1 max: 54 x̄: 6.42 x̃: 2 HURT stats (rel) min: 0.17% max: 5.65% x̄: 0.88% x̃: 0.42% 95% mean confidence interval for non-nops value: -6.57 -5.43 95% mean confidence interval for non-nops %-change: -1.97% -1.72% Non-nops are helped.
total mov in shared programs: 157785 -> 157648 (-0.09%) mov in affected programs: 22512 -> 22375 (-0.61%) helped: 146 HURT: 107 helped stats (abs) min: 1 max: 21 x̄: 4.23 x̃: 3 helped stats (rel) min: 0.35% max: 100.00% x̄: 13.82% x̃: 8.70% HURT stats (abs) min: 1 max: 73 x̄: 4.49 x̃: 2 HURT stats (rel) min: 0.37% max: 150.00% x̄: 13.05% x̃: 8.00% 95% mean confidence interval for mov value: -1.43 0.35 95% mean confidence interval for mov %-change: -5.19% 0.29% Inconclusive result (value mean confidence interval includes 0).
total cov in shared programs: 87321 -> 87300 (-0.02%) cov in affected programs: 62 -> 41 (-33.87%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1 helped stats (rel) min: 3.85% max: 94.12% x̄: 32.60% x̃: 25.00% 95% mean confidence interval for cov value: -9.93 2.93 95% mean confidence interval for cov %-change: -66.45% 1.25% Inconclusive result (value mean confidence interval includes 0).
total dwords in shared programs: 8822092 -> 8800964 (-0.24%) dwords in affected programs: 1494868 -> 1473740 (-1.41%) helped: 813 HURT: 28 helped stats (abs) min: 2 max: 352 x̄: 26.39 x̃: 24 helped stats (rel) min: 0.04% max: 25.00% x̄: 2.88% x̃: 1.52% HURT stats (abs) min: 2 max: 96 x̄: 11.64 x̃: 4 HURT stats (rel) min: 0.07% max: 3.42% x̄: 0.79% x̃: 0.40% 95% mean confidence interval for dwords value: -27.43 -22.82 95% mean confidence interval for dwords %-change: -3.03% -2.49% Dwords are helped.
total last-baryf in shared programs: 138874 -> 138990 (0.08%) last-baryf in affected programs: 1278 -> 1394 (9.08%) helped: 18 HURT: 19 helped stats (abs) min: 1 max: 13 x̄: 4.39 x̃: 3 helped stats (rel) min: 2.78% max: 72.22% x̄: 15.83% x̃: 9.64% HURT stats (abs) min: 2 max: 31 x̄: 10.26 x̃: 11 HURT stats (rel) min: 4.11% max: 260.00% x̄: 95.25% x̃: 22.86% 95% mean confidence interval for last-baryf value: 0.00 6.27 95% mean confidence interval for last-baryf %-change: 8.78% 73.64% Last-baryf are HURT.
total last-helper in shared programs: 1143827 -> 1126798 (-1.49%) last-helper in affected programs: 578828 -> 561799 (-2.94%) helped: 678 HURT: 108 helped stats (abs) min: 1 max: 465 x̄: 27.74 x̃: 15 helped stats (rel) min: 0.15% max: 67.48% x̄: 6.75% x̃: 3.60% HURT stats (abs) min: 1 max: 139 x̄: 16.45 x̃: 6 HURT stats (rel) min: 0.07% max: 104.76% x̄: 8.98% x̃: 1.97% 95% mean confidence interval for last-helper value: -24.58 -18.75 95% mean confidence interval for last-helper %-change: -5.41% -3.76% Last-helper are helped.
total half in shared programs: 0 -> 0 half in affected programs: 0 -> 0 helped: 0 HURT: 0
total full in shared programs: 224530 -> 224480 (-0.02%) full in affected programs: 390 -> 340 (-12.82%) helped: 36 HURT: 10 helped stats (abs) min: 1 max: 16 x̄: 2.22 x̃: 2 helped stats (rel) min: 12.50% max: 37.50% x̄: 23.66% x̃: 25.00% HURT stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 2 HURT stats (rel) min: 25.00% max: 60.00% x̄: 35.50% x̃: 25.00% 95% mean confidence interval for full value: -2.07 -0.10 95% mean confidence interval for full %-change: -18.62% -2.98% Full are helped.
total constlen in shared programs: 609804 -> 609804 (0.00%) constlen in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat0 in shared programs: 1041898 -> 1008830 (-3.17%) cat0 in affected programs: 523961 -> 490893 (-6.31%) helped: 1712 HURT: 131 helped stats (abs) min: 1 max: 488 x̄: 19.73 x̃: 6 helped stats (rel) min: 0.17% max: 50.00% x̄: 8.99% x̃: 8.22% HURT stats (abs) min: 1 max: 58 x̄: 5.39 x̃: 2 HURT stats (rel) min: 0.25% max: 37.90% x̄: 6.20% x̃: 3.26% 95% mean confidence interval for cat0 value: -19.61 -16.27 95% mean confidence interval for cat0 %-change: -8.28% -7.54% Cat0 are helped.
total cat1 in shared programs: 246616 -> 246468 (-0.06%) cat1 in affected programs: 35242 -> 35094 (-0.42%) helped: 150 HURT: 106 helped stats (abs) min: 1 max: 20 x̄: 4.17 x̃: 3 helped stats (rel) min: 0.25% max: 100.00% x̄: 8.70% x̃: 5.21% HURT stats (abs) min: 1 max: 73 x̄: 4.50 x̃: 2 HURT stats (rel) min: 0.26% max: 150.00% x̄: 8.33% x̃: 3.12% 95% mean confidence interval for cat1 value: -1.47 0.32 95% mean confidence interval for cat1 %-change: -3.88% 0.57% Inconclusive result (value mean confidence interval includes 0).
total cat2 in shared programs: 1516785 -> 1506113 (-0.70%) cat2 in affected programs: 437420 -> 426748 (-2.44%) helped: 1798 HURT: 0 helped stats (abs) min: 1 max: 160 x̄: 5.94 x̃: 2 helped stats (rel) min: 0.28% max: 32.05% x̄: 3.71% x̃: 1.58% 95% mean confidence interval for cat2 value: -6.49 -5.38 95% mean confidence interval for cat2 %-change: -3.96% -3.46% Cat2 are helped.
total cat3 in shared programs: 1195536 -> 1195536 (0.00%) cat3 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat4 in shared programs: 84057 -> 84057 (0.00%) cat4 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat5 in shared programs: 47537 -> 47537 (0.00%) cat5 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat6 in shared programs: 53323 -> 53323 (0.00%) cat6 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat7 in shared programs: 3146 -> 3146 (0.00%) cat7 in affected programs: 0 -> 0 helped: 0 HURT: 0
total stp in shared programs: 2452 -> 2452 (0.00%) stp in affected programs: 0 -> 0 helped: 0 HURT: 0
total ldp in shared programs: 572 -> 572 (0.00%) ldp in affected programs: 0 -> 0 helped: 0 HURT: 0
total sstall in shared programs: 363810 -> 364867 (0.29%) sstall in affected programs: 74696 -> 75753 (1.42%) helped: 190 HURT: 381 helped stats (abs) min: 1 max: 67 x̄: 7.26 x̃: 5 helped stats (rel) min: 0.16% max: 100.00% x̄: 15.27% x̃: 7.75% HURT stats (abs) min: 1 max: 118 x̄: 6.40 x̃: 3 HURT stats (rel) min: 0.00% max: 700.00% x̄: 22.22% x̃: 7.84% 95% mean confidence interval for sstall value: 0.99 2.71 95% mean confidence interval for sstall %-change: 5.37% 14.11% Sstall are HURT.
total (ss) in shared programs: 90388 -> 91478 (1.21%) (ss) in affected programs: 21370 -> 22460 (5.10%) helped: 168 HURT: 282 helped stats (abs) min: 1 max: 11 x̄: 2.55 x̃: 1 helped stats (rel) min: 1.01% max: 100.00% x̄: 12.30% x̃: 7.69% HURT stats (abs) min: 1 max: 81 x̄: 5.38 x̃: 2 HURT stats (rel) min: 1.14% max: 200.00% x̄: 17.84% x̃: 9.84% 95% mean confidence interval for (ss) value: 1.49 3.36 95% mean confidence interval for (ss) %-change: 3.85% 9.32% (ss) are HURT.
total systall in shared programs: 783846 -> 784779 (0.12%) systall in affected programs: 206410 -> 207343 (0.45%) helped: 267 HURT: 211 helped stats (abs) min: 1 max: 157 x̄: 16.01 x̃: 5 helped stats (rel) min: 0.06% max: 92.00% x̄: 9.63% x̃: 4.17% HURT stats (abs) min: 1 max: 311 x̄: 24.68 x̃: 8 HURT stats (rel) min: 0.00% max: 200.00% x̄: 12.71% x̃: 5.88% 95% mean confidence interval for systall value: -1.84 5.75 95% mean confidence interval for systall %-change: -1.66% 2.12% Inconclusive result (value mean confidence interval includes 0).
total (sy) in shared programs: 38045 -> 39059 (2.67%) (sy) in affected programs: 6354 -> 7368 (15.96%) helped: 78 HURT: 147 helped stats (abs) min: 1 max: 6 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.55% max: 50.00% x̄: 14.88% x̃: 13.39% HURT stats (abs) min: 1 max: 81 x̄: 7.64 x̃: 3 HURT stats (rel) min: 0.28% max: 100.00% x̄: 26.32% x̃: 25.00% 95% mean confidence interval for (sy) value: 2.80 6.21 95% mean confidence interval for (sy) %-change: 8.75% 15.32% (sy) are HURT.
total waves in shared programs: 606238 -> 606276 (<.01%) waves in affected programs: 316 -> 354 (12.03%) helped: 27 HURT: 10 helped stats (abs) min: 2 max: 4 x̄: 2.30 x̃: 2 helped stats (rel) min: 20.00% max: 50.00% x̄: 28.02% x̃: 25.00% HURT stats (abs) min: 2 max: 4 x̄: 2.40 x̃: 2 HURT stats (rel) min: 25.00% max: 33.33% x̄: 27.50% x̃: 25.00% 95% mean confidence interval for waves value: 0.28 1.77 95% mean confidence interval for waves %-change: 4.40% 21.63% Waves are helped.
total loops in shared programs: 1144 -> 1144 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0
a660 fossils including some private fossils
Totals: MaxWaves: 1525588 -> 1525922 (+0.02%); split: +0.03%, -0.01% Instrs: 48395451 -> 47821017 (-1.19%); split: -1.28%, +0.09% CodeSize: 95156426 -> 94827776 (-0.35%); split: -0.41%, +0.06% NOPs: 9922033 -> 9497906 (-4.27%); split: -4.50%, +0.23% MOVs: 3005535 -> 3004942 (-0.02%); split: -1.05%, +1.03% Full: 1844552 -> 1843635 (-0.05%); split: -0.07%, +0.02% (ss): 1253293 -> 1252074 (-0.10%); split: -0.95%, +0.85% (sy): 529048 -> 532171 (+0.59%); split: -0.70%, +1.29% (ss)-stall: 3579990 -> 3595255 (+0.43%); split: -0.92%, +1.35% (sy)-stall: 14221917 -> 14240030 (+0.13%); split: -0.80%, +0.93% STPs: 82093 -> 81899 (-0.24%) LDPs: 121029 -> 120603 (-0.35%) Subgroup size: 12046400 -> 12046464 (+0.00%) Cat0: 10562989 -> 10139632 (-4.01%); split: -4.22%, +0.21% Cat1: 4064817 -> 4053403 (-0.28%); split: -1.48%, +1.20% Cat2: 17826021 -> 17686886 (-0.78%) Cat6: 749632 -> 749104 (-0.07%)Totals from 30438 (21.36% of 142514) affected shaders: MaxWaves: 187396 -> 187730 (+0.18%); split: +0.22%, -0.04% Instrs: 30736273 -> 30161839 (-1.87%); split: -2.01%, +0.14% CodeSize: 54656008 -> 54327358 (-0.60%); split: -0.71%, +0.11% NOPs: 7586476 -> 7162349 (-5.59%); split: -5.89%, +0.30% MOVs: 1901025 -> 1900432 (-0.03%); split: -1.66%, +1.62% Full: 655596 -> 654679 (-0.14%); split: -0.19%, +0.05% (ss): 808975 -> 807756 (-0.15%); split: -1.47%, +1.32% (sy): 340308 -> 343431 (+0.92%); split: -1.09%, +2.01% (ss)-stall: 2658569 -> 2673834 (+0.57%); split: -1.24%, +1.81% (sy)-stall: 8076078 -> 8094191 (+0.22%); split: -1.41%, +1.63% STPs: 47870 -> 47676 (-0.41%) LDPs: 84950 -> 84524 (-0.50%) Subgroup size: 3182272 -> 3182336 (+0.00%) Cat0: 8063124 -> 7639767 (-5.25%); split: -5.53%, +0.28% Cat1: 2481615 -> 2470201 (-0.46%); split: -2.42%, +1.96% Cat2: 11216283 -> 11077148 (-1.24%) Cat6: 379784 -> 379256 (-0.14%)
Note that the (ss)
-stall and (sy)
-stall stats are hurt in general. While I'd like to investigate this a bit further, I have two observations:
- Since the overall instruction count goes down, there are probably just less instructions available to fill stall cycles.
- I noticed some shaders where essentially nothing changed by this series but some stats are still hurt. This seems to be caused by slightly different scheduling decisions: before this series, once an instruction with a predicate destination would be scheduled in a block, subsequent ones would not get scheduled until there was no other choice left because of a predicate conflict. This restriction has now been removed which often causes subsequent predicate writes to be scheduled earlier. This seems to have a ripple effect that sometimes hurts certain stats.