Skip to content

WIP: aco: Combine SMEM loads in the optimizer

Timur Kristóf requested to merge Venemo/mesa:aco-load-vectorize into main

This MR adds the ability to ACO's optimizer to combine SMEM loads that have adjacent offsets. This enables us to deal with adjacent loads that the NIR load/store vectorizer can't combine.

At the backwards pass of the optimizer (after other optimizations are completed) it collects information about all loads in each block, and then it combines those that are adjacent, when it makes sense. It currently has the following limits:

  • It doesn't combine loads which have "gaps" in them, eg. 3 x s_load_dwordx2 does NOT get combined into an s_load_dwordx8, but just two of them are combined into an s_load_dwordx4 and one s_load_dwordx2 remains. Rationale: instruction selection can already emit loads with gaps, so it doesn't make sense to do that here.
  • It is able to handle arbitrary sized adjacent loads, but can't break a bigger load into a smaller one. So it is able to combine s_load_dword + s_load_dwordx2 + s_load_dword into a single s_load_dwordx4 but it can't combine s_load_dwordx8 + s_load_dwordx16 + s_load_dwordx8 into two s_load_dwordx16. This seems to be rare enough so that we needn't care about it.
  • Creating a number of big loads gives a license to the scheduler to schedule those even more aggressively, resulting in possible issues with RA. To deal with this, it is only allowed to create a certain number of x16 loads per block.
  • It doesn't handle non-reorderable loads.

As can be expected, all this comes with the price of significantly increasing our SGPR use, however it only causes a minimal regression to the max waves. There are some heuristics which prevent the optimization from doing too much harm.

UPDATE: Dropped the 3 problematic commits that caused the scheduler to go crazy.

Edited by Daniel Schürmann

Merge request reports