ir3: add support for repeated instructions
ir3 is a scalar architecture and as such most instructions cannot be vectorized. However, many instructions support the (rptN)
modifier that allows us to mimic vector instructions. Whenever an instruction has the (rptN)
modifier set it will execute N more time, incrementing its destination register for each repetition. Additionally, source registers with the (r)
flag set will also be incremented.
For example:
(rpt1)add.f r0.x, (r)r1.x, r2.x
is the same as:
add.f r0.x, r1.x, r2.x
add.f r0.y, r1.y, r2.x
The main benefit of using repeated instructions is a reduction in code size. Since every iteration is still executed as a scalar instruction, there's no direct benefit in terms of runtime. The only exception seems to be for 3-source instructions pre-a7xx: if one of the sources is constant (i.e., without the (r)
flag), a repeated instruction executes faster than the equivalent expanded sequence. Presumably, this is because the ALU only has 2 register read ports. I have not been able to measure this difference on a7xx though.
Support for repeated instructions consists of two parts. First, we need to make sure NIR is (mostly) vectorized when translating to ir3. I have not been able to find a way to keep NIR vectorized all the way and still generate decent code. Therefore, I have taken the approach of vectorizing the (scalarized) NIR right before translating it to ir3.
Secondly, ir3 needs to be adapted to ingest vectorized NIR and translate it to repeated instructions. To this end, I have introduced the concept of "repeat groups" to ir3. A repeat group is a group of instructions that were produced from a vectorized NIR operation and linked together. They are, however, still separate scalar instructions until quite late.
More concretely:
- Instruction emission: for every vectorized NIR operation, emit separate scalar instructions for its components and link them together in a repeat group. For every instruction builder
ir3_X
, a new repeat builderir3_X_rpt
has been added to facilitate this. - Optimization passes: for now, repeat groups are completely ignored by optimizations.
- Pre-RA: clean up repeat groups that can never be merged into an actual
rptN
instruction (e.g., because their instructions are not consecutive anymore). This ensures no useless merge sets will be created in the next step. - RA: create merge sets for the sources and defs of the instructions in repeat groups. This way, RA will try to allocate consecutive registers for them. This will not be forced though because we prefer to split-up repeat groups over creating movs to reorder registers.
- Post-RA: create actual
rptN
instructions for repeat groups where the allocated registers allow it.
The idea for step 2 is that we prefer that any potential optimizations take precedence over creating rptN
instructions as the latter will only yield a code size benefit. However, it might be interesting to investigate if we could make some optimizations repeat aware. For example, the scheduler could try to schedule instructions of a repeat group together.
Results
The total code size reduction on shader-db is 10.14%.
Details
total instructions in shared programs: 4179917 -> 4151747 (-0.67%) instructions in affected programs: 3509729 -> 3481559 (-0.80%) helped: 12025 HURT: 8185 helped stats (abs) min: 1 max: 1051 x̄: 6.16 x̃: 2 helped stats (rel) min: 0.05% max: 36.99% x̄: 4.08% x̃: 2.78% HURT stats (abs) min: 1 max: 681 x̄: 5.62 x̃: 3 HURT stats (rel) min: 0.06% max: 50.45% x̄: 4.26% x̃: 3.06% 95% mean confidence interval for instructions value: -1.72 -1.07 95% mean confidence interval for instructions %-change: -0.78% -0.62% Instructions are helped.total nops in shared programs: 933098 -> 901164 (-3.42%) nops in affected programs: 806898 -> 774964 (-3.96%) helped: 11827 HURT: 7808 helped stats (abs) min: 1 max: 423 x̄: 6.14 x̃: 2 helped stats (rel) min: 0.23% max: 100.00% x̄: 38.09% x̃: 33.33% HURT stats (abs) min: 1 max: 588 x̄: 5.21 x̃: 3 HURT stats (rel) min: 0.00% max: 2300.00% x̄: 47.45% x̃: 16.05% 95% mean confidence interval for nops value: -1.93 -1.32 95% mean confidence interval for nops %-change: -5.24% -2.91% Nops are helped.
total non-nops in shared programs: 3246819 -> 3250583 (0.12%) non-nops in affected programs: 1309486 -> 1313250 (0.29%) helped: 2240 HURT: 2425 helped stats (abs) min: 1 max: 802 x̄: 4.18 x̃: 3 helped stats (rel) min: 0.05% max: 29.55% x̄: 2.45% x̃: 1.65% HURT stats (abs) min: 1 max: 155 x̄: 5.42 x̃: 3 HURT stats (rel) min: 0.03% max: 50.45% x̄: 3.16% x̃: 1.44% 95% mean confidence interval for non-nops value: 0.26 1.35 95% mean confidence interval for non-nops %-change: 0.34% 0.60% Non-nops are HURT.
total mov in shared programs: 162669 -> 166980 (2.65%) mov in affected programs: 81329 -> 85640 (5.30%) helped: 2184 HURT: 2394 helped stats (abs) min: 1 max: 77 x̄: 3.35 x̃: 2 helped stats (rel) min: 0.69% max: 100.00% x̄: 44.05% x̃: 37.50% HURT stats (abs) min: 1 max: 155 x̄: 4.86 x̃: 3 HURT stats (rel) min: 0.00% max: 2500.00% x̄: 64.78% x̃: 25.00% 95% mean confidence interval for mov value: 0.71 1.17 95% mean confidence interval for mov %-change: 9.65% 16.06% Mov are HURT.
total cov in shared programs: 89791 -> 89810 (0.02%) cov in affected programs: 898 -> 917 (2.12%) helped: 1 HURT: 7 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 1.22% max: 1.22% x̄: 1.22% x̃: 1.22% HURT stats (abs) min: 1 max: 16 x̄: 3.14 x̃: 1 HURT stats (rel) min: 0.66% max: 1600.00% x̄: 229.49% x̃: 0.88% 95% mean confidence interval for cov value: -2.37 7.12 95% mean confidence interval for cov %-change: -272.05% 673.36% Inconclusive result (value mean confidence interval includes 0).
total dwords in shared programs: 9029932 -> 8113958 (-10.14%) dwords in affected programs: 7871742 -> 6955768 (-11.64%) helped: 25697 HURT: 153 helped stats (abs) min: 2 max: 3226 x̄: 35.83 x̃: 32 helped stats (rel) min: 0.09% max: 62.50% x̄: 20.26% x̃: 16.67% HURT stats (abs) min: 2 max: 192 x̄: 30.44 x̃: 30 HURT stats (rel) min: 0.35% max: 26.09% x̄: 5.70% x̃: 5.21% 95% mean confidence interval for dwords value: -36.18 -34.69 95% mean confidence interval for dwords %-change: -20.29% -19.93% Dwords are helped.
total last-baryf in shared programs: 138838 -> 145081 (4.50%) last-baryf in affected programs: 76911 -> 83154 (8.12%) helped: 471 HURT: 811 helped stats (abs) min: 1 max: 118 x̄: 8.75 x̃: 4 helped stats (rel) min: 0.41% max: 100.00% x̄: 18.64% x̃: 11.11% HURT stats (abs) min: 1 max: 181 x̄: 12.78 x̃: 7 HURT stats (rel) min: 0.42% max: 2400.00% x̄: 99.13% x̃: 18.97% 95% mean confidence interval for last-baryf value: 3.81 5.93 95% mean confidence interval for last-baryf %-change: 45.91% 65.81% Last-baryf are HURT.
total last-helper in shared programs: 1208139 -> 1191785 (-1.35%) last-helper in affected programs: 1080159 -> 1063805 (-1.51%) helped: 2818 HURT: 2235 helped stats (abs) min: 1 max: 370 x̄: 22.34 x̃: 8 helped stats (rel) min: 0.08% max: 100.00% x̄: 18.27% x̃: 8.89% HURT stats (abs) min: 1 max: 229 x̄: 20.85 x̃: 8 HURT stats (rel) min: 0.00% max: 2170.00% x̄: 25.60% x̃: 5.56% 95% mean confidence interval for last-helper value: -4.41 -2.06 95% mean confidence interval for last-helper %-change: -0.75% 3.01% Inconclusive result (%-change mean confidence interval includes 0).
total half in shared programs: 0 -> 0 half in affected programs: 0 -> 0 helped: 0 HURT: 0
total full in shared programs: 217263 -> 228709 (5.27%) full in affected programs: 44215 -> 55661 (25.89%) helped: 23 HURT: 9815 helped stats (abs) min: 1 max: 16 x̄: 2.74 x̃: 2 helped stats (rel) min: 14.29% max: 50.00% x̄: 23.01% x̃: 20.00% HURT stats (abs) min: 1 max: 16 x̄: 1.17 x̃: 1 HURT stats (rel) min: 3.45% max: 100.00% x̄: 28.31% x̃: 25.00% 95% mean confidence interval for full value: 1.15 1.18 95% mean confidence interval for full %-change: 27.94% 28.43% Full are HURT.
total constlen in shared programs: 622684 -> 622684 (0.00%) constlen in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat0 in shared programs: 1031899 -> 1000258 (-3.07%) cat0 in affected programs: 880633 -> 848992 (-3.59%) helped: 11828 HURT: 7808 helped stats (abs) min: 1 max: 424 x̄: 6.12 x̃: 2 helped stats (rel) min: 0.21% max: 90.00% x̄: 27.20% x̃: 25.00% HURT stats (abs) min: 1 max: 588 x̄: 5.22 x̃: 3 HURT stats (rel) min: 0.15% max: 3600.00% x̄: 60.50% x̃: 20.00% 95% mean confidence interval for cat0 value: -1.91 -1.31 95% mean confidence interval for cat0 %-change: 6.38% 8.96% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).
total cat1 in shared programs: 256026 -> 259430 (1.33%) cat1 in affected programs: 122730 -> 126134 (2.77%) helped: 2196 HURT: 2408 helped stats (abs) min: 1 max: 794 x̄: 4.18 x̃: 2 helped stats (rel) min: 0.54% max: 100.00% x̄: 30.91% x̃: 20.00% HURT stats (abs) min: 1 max: 155 x̄: 5.22 x̃: 3 HURT stats (rel) min: 0.00% max: 2500.00% x̄: 55.93% x̃: 16.67% 95% mean confidence interval for cat1 value: 0.20 1.28 95% mean confidence interval for cat1 %-change: 11.49% 17.54% Cat1 are HURT.
total cat2 in shared programs: 1512198 -> 1512327 (<.01%) cat2 in affected programs: 36177 -> 36306 (0.36%) helped: 51 HURT: 38 helped stats (abs) min: 1 max: 2 x̄: 1.84 x̃: 2 helped stats (rel) min: 0.28% max: 28.57% x̄: 10.21% x̃: 9.52% HURT stats (abs) min: 1 max: 70 x̄: 5.87 x̃: 2 HURT stats (rel) min: 0.05% max: 13.19% x̄: 3.18% x̃: 1.53% 95% mean confidence interval for cat2 value: -0.29 3.19 95% mean confidence interval for cat2 %-change: -6.27% -2.71% Inconclusive result (value mean confidence interval includes 0).
total cat3 in shared programs: 1194302 -> 1194302 (0.00%) cat3 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat4 in shared programs: 84081 -> 84081 (0.00%) cat4 in affected programs: 0 -> 0 helped: 0 HURT: 0
total cat5 in shared programs: 48109 -> 48058 (-0.11%) cat5 in affected programs: 105 -> 54 (-48.57%) helped: 51 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 14.29% max: 100.00% x̄: 67.78% x̃: 50.00% 95% mean confidence interval for cat5 value: -1.00 -1.00 95% mean confidence interval for cat5 %-change: -76.79% -58.78% Cat5 are helped.
total cat6 in shared programs: 50156 -> 50145 (-0.02%) cat6 in affected programs: 1002 -> 991 (-1.10%) helped: 2 HURT: 1 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 2.75% max: 2.75% x̄: 2.75% x̃: 2.75% HURT stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19%
total cat7 in shared programs: 3146 -> 3146 (0.00%) cat7 in affected programs: 0 -> 0 helped: 0 HURT: 0
total stp in shared programs: 2448 -> 2432 (-0.65%) stp in affected programs: 1200 -> 1184 (-1.33%) helped: 2 HURT: 0
total ldp in shared programs: 568 -> 557 (-1.94%) ldp in affected programs: 496 -> 485 (-2.22%) helped: 2 HURT: 1 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 19.05% max: 19.05% x̄: 19.05% x̃: 19.05% HURT stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 HURT stats (rel) min: 1.21% max: 1.21% x̄: 1.21% x̃: 1.21%
total sstall in shared programs: 415472 -> 417708 (0.54%) sstall in affected programs: 320154 -> 322390 (0.70%) helped: 2898 HURT: 3584 helped stats (abs) min: 1 max: 219 x̄: 6.60 x̃: 5 helped stats (rel) min: 0.09% max: 100.00% x̄: 28.52% x̃: 19.15% HURT stats (abs) min: 1 max: 93 x̄: 5.96 x̃: 4 HURT stats (rel) min: 0.00% max: 1800.00% x̄: 44.40% x̃: 14.29% 95% mean confidence interval for sstall value: 0.10 0.59 95% mean confidence interval for sstall %-change: 9.69% 13.91% Sstall are HURT.
total (ss) in shared programs: 102307 -> 102046 (-0.26%) (ss) in affected programs: 60114 -> 59853 (-0.43%) helped: 2591 HURT: 2605 helped stats (abs) min: 1 max: 40 x̄: 1.49 x̃: 1 helped stats (rel) min: 0.51% max: 100.00% x̄: 21.04% x̃: 16.67% HURT stats (abs) min: 1 max: 11 x̄: 1.38 x̃: 1 HURT stats (rel) min: 0.00% max: 400.00% x̄: 35.29% x̃: 25.00% 95% mean confidence interval for (ss) value: -0.10 0.00 95% mean confidence interval for (ss) %-change: 6.16% 8.24% Inconclusive result (value mean confidence interval includes 0).
total systall in shared programs: 765325 -> 764564 (-0.10%) systall in affected programs: 503928 -> 503167 (-0.15%) helped: 2717 HURT: 2446 helped stats (abs) min: 1 max: 806 x̄: 13.11 x̃: 6 helped stats (rel) min: 0.04% max: 100.00% x̄: 21.19% x̃: 11.90% HURT stats (abs) min: 1 max: 200 x̄: 14.25 x̃: 8 HURT stats (rel) min: 0.00% max: 3600.00% x̄: 35.88% x̃: 12.31% 95% mean confidence interval for systall value: -0.82 0.53 95% mean confidence interval for systall %-change: 3.50% 8.19% Inconclusive result (value mean confidence interval includes 0).
total (sy) in shared programs: 38451 -> 38584 (0.35%) (sy) in affected programs: 9152 -> 9285 (1.45%) helped: 714 HURT: 764 helped stats (abs) min: 1 max: 6 x̄: 1.18 x̃: 1 helped stats (rel) min: 1.75% max: 75.00% x̄: 30.29% x̃: 33.33% HURT stats (abs) min: 1 max: 5 x̄: 1.28 x̃: 1 HURT stats (rel) min: 0.41% max: 400.00% x̄: 47.79% x̃: 50.00% 95% mean confidence interval for (sy) value: 0.02 0.16 95% mean confidence interval for (sy) %-change: 7.64% 12.50% (sy) are HURT.
total waves in shared programs: 608612 -> 608298 (-0.05%) waves in affected programs: 1662 -> 1348 (-18.89%) helped: 17 HURT: 122 helped stats (abs) min: 2 max: 4 x̄: 2.35 x̃: 2 helped stats (rel) min: 20.00% max: 100.00% x̄: 38.82% x̃: 33.33% HURT stats (abs) min: 2 max: 6 x̄: 2.90 x̃: 2 HURT stats (rel) min: 12.50% max: 50.00% x̄: 24.01% x̃: 25.00% 95% mean confidence interval for waves value: -2.61 -1.91 95% mean confidence interval for waves %-change: -20.30% -12.35% Waves are HURT.
total loops in shared programs: 1088 -> 1088 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0