Skip to content

aco: lower gfx6-7 sub dword early

GFX6-7 can only write full registers, so we get no benefits and a lot of pain from trying to handle it in the register allocator and copy lowering.

The new pass replaces all sub dword temps with full dwords and lowers p_create_vector, p_extract_vector and p_split_vector pre-RA but after isel. This allows us to avoid having gfx6/7 special cases spread all around isel.

Codegen is also better:

Foz-DB GFX6:
Totals from 1880 (3.04% of 61893) affected shaders:
MaxWaves: 11385 -> 11388 (+0.03%)
Instrs: 2090712 -> 2087738 (-0.14%); split: -0.15%, +0.00%
CodeSize: 10394804 -> 10382948 (-0.11%); split: -0.12%, +0.01%
SGPRs: 117520 -> 117512 (-0.01%)
VGPRs: 97860 -> 97392 (-0.48%); split: -0.50%, +0.02%
SpillSGPRs: 56 -> 55 (-1.79%)
Latency: 29955105 -> 29944943 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 13688233 -> 13677698 (-0.08%); split: -0.09%, +0.01%
VClause: 50932 -> 50934 (+0.00%)
SClause: 80929 -> 80792 (-0.17%); split: -0.17%, +0.00%
Copies: 245912 -> 231885 (-5.70%); split: -5.70%, +0.00%
Branches: 56782 -> 56781 (-0.00%); split: -0.00%, +0.00%
PreVGPRs: 81815 -> 81787 (-0.03%)
VALU: 1362620 -> 1359639 (-0.22%); split: -0.22%, +0.01%
SALU: 322177 -> 322176 (-0.00%); split: -0.00%, +0.00%
Edited by Georg Lehmann

Merge request reports