Skip to content

ir3: Rewrite physical edge handling and add new shared RA pass to fix bugs with shared registers

I have a series that significantly expands our usage of shared registers by using the "scalar ALU", but that exposes a number of intertwined bugs with our shared register handling:

  1. We were not modelling physical edges correctly in cases with a deeply-nested break or continue. There can be arbitrarily many extra physical edges out of a block, and we were only allowing one.
  2. Once we fix (1), then our workaround for physical edges with shared registers in RA doesn't work. I've tried many different ways to get it to work and I've come up empty so far.
  3. Before a6xx gen3 (a650), there isn't even support for ALU instructions with shared registers and swz appears to not work sometimes, so it's impossible to swap two registers.
  4. We weren't supporting spilling of shared registers to normal ones.

We fix (1) by rewriting to use a new pass, which in addition to more closely modelling the HW, will be easy to integrate with divergence analysis to get more accurate physical edges, which will be necessary for handling phi nodes of shared registers. Before we do that, though, we need to fix (2) by adding a specialized pass that combines spilling and allocation of shared registers. This also fixes the rest of the issues. Spilling shared registers is sufficiently different, and simpler, from spilling normal registers that it's possible to integrate it with RA and significantly reduce the complexity of both. Rather than splitting live ranges, we spill on-the-fly when we run out of registers. However we may still insert shuffle code for phi nodes.

The new RA pass doesn't need to run all the time, because usually we can guarantee ahead of time that shared live ranges are never split. Since the extra pass is going to be slower, we try to avoid it when we can.

There's a commit ordering issue here, where a number of the features won't be tested until my scalar ALU series lands. For example, none of the subgroup tests hit the spilling path, even though they in theory could. Furthermore, in order to not generate terrible code when spilling scalar ALU instructions we need to be able to demote them rather than spilling around them, and the way we handle sources has to be structured to allow this to work by delaying reloading sources. We also need to handle phi nodes of shared registers which aren't a thing yet. I've chosen to keep the useless code in rather than oddly structuring the rest of it for seemingly no purpose.

This depends on !22071 (merged) because the reconvergence pass uses interval trees.

Edited by Connor Abbott

Merge request reports