aco: Combine SMEM loads in the optimizer
This MR adds the ability to ACO's optimizer to combine SMEM loads that have adjacent offsets, enabling us to deal with adjacent loads that the NIR load/store vectorizer can't combine.
At the backwards pass of the optimizer (after other optimizations are completed) it collects information about all loads in each block, and then it combines those that are adjacent, when it makes sense. It currently has the following limits:
- It doesn't handle non-reorderable loads yet.
- It doesn't combine loads which have "gaps" between them, eg. 3 x
s_load_dwordx2
does NOT get combined into ans_load_dwordx8
, but just two of them are combined into ans_load_dwordx4
and ones_load_dwordx2
remains. - It is able to handle arbitrary sized adjacent loads, but can't break a bigger load into a smaller one. So it is able to combine
s_load_dword
+s_load_dwordx2
+s_load_dword
into a singles_load_dwordx4
but it can't combines_load_dwordx8
+s_load_dwordx16
+s_load_dwordx8
into 2xs_load_dwordx16
. This seems to be rare enough so that we needn't care about it.
In order not to increase SGPR pressure too much (which can lead to spilling), the MR also introduces some heuristics with regards to the maximum SMEM load size and the maximum amount of 16-sized SMEM loads. This prevents the optimization from doing too much harm.