Skip to content

brw: better surface state rematerilization & fewer Wa_1407528679

What does this MR do and why?

Wa_1407528679 is due to EU fusion, running shader code on completely disabled subgroups. We are required to make sure not to run SEND instructions with NoMask when all lanes are disabled (we have to build a mask, reading an ARF register and set it on the SEND instruction).

This interacts badly with our load uniform instructions (typically using NoMask to reduce the number of register hold the values), because using an invalid surface state parameter to the SEND could lead to hangs.

Fortunately if we can ensure the surface state was also built with NoMask, then we know that the surface state value will be valid. And as a result we can skip the workaround.

That ends up being a pretty big cut of instructions around SEND messages as well as dropping some register pressure and finally allowing better instruction scheduling (due to not dealing with ARF registers, which tends to be a scheduling barrier).

Wa_1407528679 only applies to Gfx12 products, so expect lower improvements on Xe2 and Gfx11/Gfx9 (there is still some from the better rematerialization).

results

Should fix #11236 (closed)

Todo :

  • Rematerialize A64 uniform loads
  • Is there a better mechanism to flag/recognize instruction that can drop Wa_1407528679 ? (currently manually flagged, maybe it's checkable when emitting the WA?)

Some shader stats on DG2 :

Assassin's Creed Odyssey:
Totals from 217 (10.29% of 2108) affected shaders:
Instrs: 432580 -> 430788 (-0.41%)
Cycle count: 376319054 -> 376423106 (+0.03%); split: -0.13%, +0.15%

Cyberpunk 2077:
Totals from 511 (91.41% of 559) affected shaders:
Instrs: 909961 -> 908709 (-0.14%); split: -0.33%, +0.19%
Cycle count: 1155512727 -> 1153593541 (-0.17%); split: -1.20%, +1.03%
Spill count: 15911 -> 15934 (+0.14%); split: -0.40%, +0.54%
Fill count: 35749 -> 35794 (+0.13%); split: -0.22%, +0.35%
Scratch Memory Size: 1273856 -> 1274880 (+0.08%)
Max live registers: 33227 -> 33378 (+0.45%); split: -0.05%, +0.50%

Dark Souls 3:
Totals from 5 (0.37% of 1364) affected shaders:
Instrs: 3229 -> 3196 (-1.02%)
Cycle count: 227568 -> 218974 (-3.78%); split: -4.08%, +0.30%

Dota 2:
Totals from 1 (0.07% of 1505) affected shaders:
Instrs: 112 -> 103 (-8.04%)
Cycle count: 1834 -> 1714 (-6.54%)

Hitman 3:
Totals from 1 (1.79% of 56) affected shaders:
Instrs: 114 -> 105 (-7.89%)
Cycle count: 1850 -> 1726 (-6.70%)

Q2RTX:
Totals from 55 (56.12% of 98) affected shaders:
Instrs: 37359 -> 37273 (-0.23%); split: -0.43%, +0.20%
Cycle count: 1047035 -> 1041736 (-0.51%); split: -0.63%, +0.12%
Max live registers: 2760 -> 2753 (-0.25%)

Red Dead Redemption 2:
Totals from 2165 (37.09% of 5837) affected shaders:
Instrs: 2243895 -> 2157984 (-3.83%); split: -3.83%, +0.00%
Cycle count: 3629540197 -> 3557786793 (-1.98%); split: -2.20%, +0.23%
Spill count: 7460 -> 7430 (-0.40%); split: -0.59%, +0.19%
Fill count: 16492 -> 16560 (+0.41%); split: -0.61%, +1.02%
Max live registers: 177832 -> 177828 (-0.00%)
Max dispatch width: 20160 -> 20128 (-0.16%)

Rise Of The Tomb Raider:
Totals from 14 (7.87% of 178) affected shaders:
Instrs: 13805 -> 13415 (-2.83%)
Cycle count: 12375733 -> 11807571 (-4.59%)

Strange Brigade:
Totals from 3567 (86.79% of 4110) affected shaders:
Instrs: 1709046 -> 1353343 (-20.81%)
Subgroup size: 36776 -> 36784 (+0.02%)
Cycle count: 27760832 -> 24764372 (-10.79%); split: -10.90%, +0.10%
Max live registers: 197510 -> 195229 (-1.15%)
Max dispatch width: 28624 -> 28752 (+0.45%)

Gfx9 stats (mostly unchanged) :

Cyberpunk 2077:
Totals from 2 (3.92% of 51) affected shaders:
Instrs: 620 -> 613 (-1.13%)
Cycle count: 28050 -> 28084 (+0.12%); split: -0.04%, +0.16%

Strange Brigade:
Totals from 66 (36.67% of 180) affected shaders:
Instrs: 36001 -> 35772 (-0.64%)
Cycle count: 1142280 -> 1137160 (-0.45%); split: -0.45%, +0.00%
Max live registers: 3006 -> 3022 (+0.53%)
Edited by Lionel Landwerlin

Merge request reports