Skip to content

intel/brw: Treat convergent values as SIMD8

Ian Romanick requested to merge idr/mesa:review/fs-scalar into main

This is the first part of "treat convergent values as SIMD8." This series lays the groundwork to treat convergent ALU operations and some convergent "leaf" operations as SIMD8 in all dispatch modes. In SIMD16 this saves register space. In SIMD32 it saves register space and instructions. Leaf operations are operations that are always the leaves of NIR ALU expression trees. Load constant, load UBO, etc. are all leaf operations.

The basic idea is values that are not marked as divergent in NIR get an is_scalar flag set. These values are generated using SIMD8 dispatch with NoMask set. When used as sources to SIMD16 instructions, the values are accessed using <0,1,0>. When used as sources to SIMD8 instructions, the values can either be accessed as <0,1,0> or <8,8,1>. This is helpful for instructions that cannot use scalar sources. The is_scalar flags is used during code generation to convert illegal <0,1,0> to <8,8,1>.

There is a lot of work that remains to be done, but there are two initial areas of interest.

First, many more leaf operations need to be supported. The branch https://gitlab.freedesktop.org/idr/mesa/-/commits/wip/fs-scalar has some work in this area. There is a lot of overlap with this work and !29663 (merged). We should figure out how to align the two approaches before getting much farther down this path.

Second, some heuristics need to be implemented for cases that should avoid generating scalar instructions. For example, scalarizing all ALU instructions can prevent cmod propagation from being able to make progress. Some messages require sources in full SIMD size, so a scalar source may need to be expanded before the SEND. This adds instructions and temporaries. The previously mentioned branch contains some quick hacks for these scenarios, but more work needs to be done.

Meteor Lake fossil-db results across whole series:

Totals:
Instrs: 153086362 -> 152797341 (-0.19%); split: -0.19%, +0.00%
Subgroup size: 7711848 -> 7711856 (+0.00%)
Cycle count: 17289437317 -> 17245038543 (-0.26%); split: -0.42%, +0.16%
Spill count: 85118 -> 85086 (-0.04%)
Fill count: 151390 -> 151298 (-0.06%)
Max live registers: 32443998 -> 32439804 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 5550888 -> 5551032 (+0.00%); split: +0.00%, -0.00%

Totals from 15367 (2.43% of 632607) affected shaders:
Instrs: 6858310 -> 6569289 (-4.21%); split: -4.25%, +0.03%
Subgroup size: 167432 -> 167440 (+0.00%)
Cycle count: 12773749065 -> 12729350291 (-0.35%); split: -0.57%, +0.22%
Spill count: 40294 -> 40262 (-0.08%)
Fill count: 75656 -> 75564 (-0.12%)
Max live registers: 807068 -> 802874 (-0.52%); split: -0.53%, +0.01%
Max dispatch width: 151776 -> 151920 (+0.09%); split: +0.12%, -0.02%

Meteor Lake shader-db results across whole series:

total instructions in shared programs: 19863339 -> 19863249 (<.01%)
instructions in affected programs: 51666 -> 51576 (-0.17%)
helped: 24
HURT: 55
helped stats (abs) min: 1 max: 272 x̄: 14.62 x̃: 1
helped stats (rel) min: 0.06% max: 36.27% x̄: 2.04% x̃: 0.14%
HURT stats (abs)   min: 1 max: 17 x̄: 4.75 x̃: 2
HURT stats (rel)   min: 0.13% max: 20.00% x̄: 1.74% x̃: 0.82%
95% mean confidence interval for instructions value: -8.28 6.00
95% mean confidence interval for instructions %-change: -0.54% 1.72%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 901662931 -> 901651588 (<.01%)
cycles in affected programs: 77577482 -> 77566139 (-0.01%)
helped: 346
HURT: 230
helped stats (abs) min: 2 max: 60610 x̄: 380.42 x̃: 16
helped stats (rel) min: <.01% max: 24.18% x̄: 0.82% x̃: 0.15%
HURT stats (abs)   min: 2 max: 22490 x̄: 522.97 x̃: 8
HURT stats (rel)   min: <.01% max: 19.05% x̄: 0.98% x̃: 0.30%
95% mean confidence interval for cycles value: -260.34 220.95
95% mean confidence interval for cycles %-change: -0.29% 0.08%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 5267 -> 5257 (-0.19%)
spills in affected programs: 25 -> 15 (-40.00%)
helped: 2
HURT: 0

total fills in shared programs: 5793 -> 5760 (-0.57%)
fills in affected programs: 176 -> 143 (-18.75%)
helped: 3
HURT: 0

Merge request reports