ir3 scheduler can't handle large SSBO copies
Copies of SSBO arrays get turned into a series of loads and stores:
foo[0] = bar[0]
foo[1] = bar[1]
...
Now, the offset for ldib and stib has to be in a register, so in ir3 we get a move for each offset, something like:
offset0 = 0
val = ldib offset0
stib offset0, val
offset1 = 1
val = ldib offset0
stib offset1, val
...
In the pre-RA scheduler, the moves for offset0
, offset1
, etc. are all immediately ready to schedule, but they all have an identical live effect and so it has no idea which one to schedule first and just schedules them randomly until it happens to hit offset0
. For some CTS tests this results in a huge register pressure and even unnecessary spilling.