Skip to content

aco: Combine SMEM loads in the optimizer

Timur Kristóf requested to merge Venemo/mesa:aco-load-vectorize into main

This MR adds the ability to ACO's optimizer to combine SMEM loads that have adjacent offsets, enabling us to deal with adjacent loads that the NIR load/store vectorizer can't combine.

At the backwards pass of the optimizer (after other optimizations are completed) it collects information about all loads in each block, and then it combines those that are adjacent, when it makes sense. It currently has the following limits:

  • It doesn't handle non-reorderable loads yet.
  • It doesn't combine loads which have "gaps" between them, eg. 3 x s_load_dwordx2 does NOT get combined into an s_load_dwordx8, but just two of them are combined into an s_load_dwordx4 and one s_load_dwordx2 remains.
  • It is able to handle arbitrary sized adjacent loads, but can't break a bigger load into a smaller one. So it is able to combine s_load_dword + s_load_dwordx2 + s_load_dword into a single s_load_dwordx4 but it can't combine s_load_dwordx8 + s_load_dwordx16 + s_load_dwordx8 into 2x s_load_dwordx16. This seems to be rare enough so that we needn't care about it.

In order not to increase SGPR pressure too much (which can lead to spilling), the MR also introduces some heuristics with regards to the maximum SMEM load size and the maximum amount of 16-sized SMEM loads. This prevents the optimization from doing too much harm.

Edited by Timur Kristóf

Merge request reports