aco/ra: Fix register allocation for subdword operands
ACO attempts to store the output of an instruction in the same register occupied by its operands when possible. Importantly this only works if the total operand are large enough to store the result register size. The code failed to consider subdword operands when checking for this, causing entire register slots to be freed up even though subdword parts were still used.
In Mafia 3, this affected the following code:
v2b: %363:v[2][0:16], v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]
where v[2] is allocated to %116 even though its original lower 16 bits are still used in the instruction after.
Closes: #3717 (closed) Fixes: 031edbc4
This MR adds tests that currently fail on main. Since the bug only affects RA's fallback code path (which wouldn't usually be taken in simple test shaders), I added a policy parameter to aco::register_allocation
that allows testing both the fallback and non-fallback paths.
I encountered some bugs/limitations of the test framework and made adaptions where necessary:
- long test lists would prevent
check_output.py
from being invoked due to a deadlock -
setup_cs
now supports customizing the test subvariant. The subvariant was previously only used to distinguish between chip classes (gfx8/gfx9/...), but now it's possible to add further variants (e.g. to iterate through the new test policies) - GFX10_3 now is printed as
gfx10_3
instead ofgfx11
in test names
TODO:
-
Fix remaining test failures (not bugs but nonideal register allocation)removed those tests since they didn't really improve coverage and I overestimated the potential for further optimization -
Refactor the actual fix -
Add tests for !7427 (merged) while we're at itdecided this is out of scope, since the setup required to trigger that issue isn't easily replicable in a test -
Run CTS/fossilsno changes