Skip to content

agx: Optimize loops

Alyssa Rosenzweig requested to merge alyssa/mesa:agx/breakopt into main

This MR implements a smorgasbord of optimizations for loops. On top of main, this improves dolphin ubershader fps by ~10%, which is incredible.

The first major optimization detects patterns of the form:

if cond {
   break
} else {
}

and optimizes them specially, either to a while_icmp instruction (if continue is not used in the loop) or icmpsel + update_exec (if it is). Either way, 1 or 2 instructions beats the 4 instruction sequence we get from literally translating the NIR (if_icmp + mov + pop_exec + pop_exec). These are similar to the idioms used by the LLVM blob.

This optimization taints the control flow graph, in particular it introduces critical edges that RA cannot cope with (same as the optimization that deletes else instructions for empty else blocks -- RA might need to insert moves into that block so you can't delete it). So, the optimization runs after RA (like the else deletion), at which point we don't really care about the structured CFG. The hardware doesn't mind the critical edges.

The next set of minor optimizations reduce the amount of pointless control flow instructions we emit. Just minor improvements to the instruction selection.

The last patches extend our pre-RA peephole optimizer to fuse comparisons with other instructions: compare-and-select, if-statements, and by the first optimization do...while loops. I was dragging my feet on this due to fears about bloating register pressure, but this turned out to be a non-issue in practice.

This MR is spun off from my $SECRET_PROJECT branch. In that branch, I have some kernels with hot loops. These optimizations should reduce the stupid.

This also gets us close to being able to take advantage of "inverted" loops, where the break is at the end:

loop {
   ...
   if cond {
      break
   } else {
   }
}

We generate the right instructions for this, but we aren't inserting jmp_exec_none properly. That will come in due time, with some more backend changes plus NIR growing a loop inversion (rotation) pass like LLVM.

Edited by Alyssa Rosenzweig

Merge request reports

Loading