Skip to content

ac/nir,radv: split pack_half_2x16 if aco can use v_fma_mix instead

Georg Lehmann requested to merge DadSchoorse/mesa:radv-split-pack_half into main

It's a bit annoying that we have to change the global floating point execution mode, because we don't have v_fma_mix with per instruction fp16 rounding mode.

Foz-DB Navi21:
Totals from 20425 (25.73% of 79395) affected shaders:
MaxWaves: 525608 -> 525690 (+0.02%); split: +0.02%, -0.00%
Instrs: 13570442 -> 13531690 (-0.29%); split: -0.29%, +0.00%
CodeSize: 74246928 -> 74312000 (+0.09%); split: -0.03%, +0.12%
VGPRs: 814736 -> 813384 (-0.17%); split: -0.18%, +0.02%
Latency: 103675193 -> 103574328 (-0.10%); split: -0.12%, +0.02%
InvThroughput: 25189130 -> 25114931 (-0.29%); split: -0.30%, +0.00%
VClause: 314579 -> 314573 (-0.00%); split: -0.00%, +0.00%
SClause: 526551 -> 526548 (-0.00%); split: -0.00%, +0.00%
Copies: 772147 -> 772999 (+0.11%); split: -0.01%, +0.12%
PreVGPRs: 661914 -> 661923 (+0.00%)
VALU: 9612901 -> 9574094 (-0.40%); split: -0.40%, +0.00%
SALU: 1244130 -> 1244176 (+0.00%)

Potential future work: we could do something similar with load_interpolated_input on gfx11+ and for tex/image instructions with D16. done

The pass is in ac/nir because it might be useful for radeonsi too.

Edited by Georg Lehmann

Merge request reports