aco/ra: Fix register allocation for subdword operands (!7461) · Merge requests · Mesa / mesa

Tony Wasserka requested to merge neobrain/mesa:ra_subdword_fix into master Nov 05, 2020

ACO attempts to store the output of an instruction in the same register occupied by its operands when possible. Importantly this only works if the total operand are large enough to store the result register size. The code failed to consider subdword operands when checking for this, causing entire register slots to be freed up even though subdword parts were still used.

In Mafia 3, this affected the following code:

v2b: %363:v[2][0:16],  v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]

where v[2] is allocated to %116 even though its original lower 16 bits are still used in the instruction after.

Closes: #3717 (closed) Fixes: 031edbc4

This MR adds tests that currently fail on main. Since the bug only affects RA's fallback code path (which wouldn't usually be taken in simple test shaders), I added a policy parameter to aco::register_allocation that allows testing both the fallback and non-fallback paths.

I encountered some bugs/limitations of the test framework and made adaptions where necessary:

long test lists would prevent check_output.py from being invoked due to a deadlock
setup_cs now supports customizing the test subvariant. The subvariant was previously only used to distinguish between chip classes (gfx8/gfx9/...), but now it's possible to add further variants (e.g. to iterate through the new test policies)
GFX10_3 now is printed as gfx10_3 instead of gfx11 in test names

TODO:

~~Fix remaining test failures (not bugs but nonideal register allocation)~~ removed those tests since they didn't really improve coverage and I overestimated the potential for further optimization
Refactor the actual fix
~~Add tests for !7427 (merged) while we're at it~~ decided this is out of scope, since the setup required to trigger that issue isn't easily replicable in a test
~~Run CTS/fossils~~ no changes

Edited Dec 02, 2020 by Tony Wasserka

Admin message

aco/ra: Fix register allocation for subdword operands

Merge request reports