Skip to content

aco: insert less s_delay_alu

Georg Lehmann requested to merge DadSchoorse/mesa:aco-less-delay into main

If the SIMD frontend already waits, we don't need to insert a delay to avoid stalling the ALUs. One common case where this helps is VALU -> SALU dependencies. The s_delay_alu is just unnecessary code size in this case.

The hardware also has a fast path for comparisons followed by v_cndmask, there's effectively zero latency. Inserting a s_delay_alu breaks this fast path by forcing the v_cndmask to wait until the comparison completes.

I validated most of this information with synthetic micro benchmarking (https://gitlab.freedesktop.org/DadSchoorse/bvhre/-/tree/forwarding).

RGP also shows these effects:

SALU<->VALU latency is (partially) hidden with and without s_delay_alu grafik grafik

v_cndmask_b32 has lower latency without s_delay_alu grafik grafik

Foz-DB Navi31
Totals from 47215 (59.61% of 79206) affected shaders:
Instrs: 35363360 -> 35062463 (-0.85%); split: -0.85%, +0.00%
CodeSize: 186342228 -> 185073248 (-0.68%); split: -0.68%, +0.00%
Latency: 261725582 -> 261692233 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 42382641 -> 42377295 (-0.01%); split: -0.01%, +0.00%
Edited by Georg Lehmann

Merge request reports