Skip to content

aco/ra: Fix register allocation for subdword operands

Tony Wasserka requested to merge neobrain/mesa:ra_subdword_fix into master

ACO attempts to store the output of an instruction in the same register occupied by its operands when possible. Importantly this only works if the total operand are large enough to store the result register size. The code failed to consider subdword operands when checking for this, causing entire register slots to be freed up even though subdword parts were still used.

In Mafia 3, this affected the following code:

v2b: %363:v[2][0:16],  v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]

where v[2] is allocated to %116 even though its original lower 16 bits are still used in the instruction after.

Closes: #3717 (closed) Fixes: 031edbc4


This MR adds tests that currently fail on main. Since the bug only affects RA's fallback code path (which wouldn't usually be taken in simple test shaders), I added a policy parameter to aco::register_allocation that allows testing both the fallback and non-fallback paths.

I encountered some bugs/limitations of the test framework and made adaptions where necessary:

  • long test lists would prevent check_output.py from being invoked due to a deadlock
  • setup_cs now supports customizing the test subvariant. The subvariant was previously only used to distinguish between chip classes (gfx8/gfx9/...), but now it's possible to add further variants (e.g. to iterate through the new test policies)
  • GFX10_3 now is printed as gfx10_3 instead of gfx11 in test names

TODO:

  • Fix remaining test failures (not bugs but nonideal register allocation) removed those tests since they didn't really improve coverage and I overestimated the potential for further optimization
  • Refactor the actual fix
  • Add tests for !7427 (merged) while we're at it decided this is out of scope, since the setup required to trigger that issue isn't easily replicable in a test
  • Run CTS/fossils no changes
Edited by Tony Wasserka

Merge request reports