Skip to content

intel/brw: Stop discarding the interference graph

Kenneth Graunke requested to merge kwg/mesa:intel-ra-fixes into main

This MR reworks the spilling setup for pre-LSC hardware to dynamically populate a scratch header on the fly, rather than pre-reserving one up front—similar to how LSC hardware already worked. This removes the need to reconstruct the interference graph.

It then stops discarding and recreating the interference graph the first time we need to spill. This removes a small amount of compilation time for shaders which have to spill (-0.307287% +/- 0.25484%, n=25, for Witcher 3 fossils on Alchemist).

More importantly, this fixes a nasty and obscure bug. Our backend does a somewhat unusual sequence:

  1. Set up the interference graph
  2. Try to register allocate
  3. Fail and realize we have to spill
  4. Recreate(!) the interference graph with different node counts, because unfortunately spills and fills may need temporary registers set aside for that purpose, which can no longer be used generally.
  5. Ask for the best spill node because we know we must spill

On step 4, ra_realloc_interference_graph() reallocs the in_stack bitset for the new nodes. However, it leaves the new bitset words uninitialized, because it's supposed to be set up by ra_select(). On step 5, however, the Intel backend calls ra_get_best_spill_node() without first calling ra_select() (or ra_allocate()). So at that point, the in_stack bitset is not properly initialized, and we'll end up reading uninitialized garbage in ra_get_best_spill_node(), and non-deterministically end up skipping candidates for spilling.

While debugging this, I observed ra_get_best_spill_node() seeing non-zero in_stack bits set while g->tmp.stack_count was 0. So no nodes could possibly be in the stack.

We could simply initialize the memory, but there's a deeper problem: in Chaitin-Briggs allocators, the list of spill candidates is built in the "Select" step. In our implementation, we technically don't make a list of candidates, but rather flag registers that aren't candidates. By never running ra_allocate() on our new graph, we never produce that info. So when we ask for a spill node, we consider all registers as spill candidates, which is far from ideal.

To fix this, we ............... (in brw) stop discarding the interference graph. (in elk, we would need to re-run the select step)

This escaped our notice for a long time because our allocation loop tries to spill a single register, tries to allocate, and repeats if it fails. Because retrying calls ra_select(), which initializes the spill candidate info, this non-determinism only happened for the first register selected. However, recently the backend gained support for spilling multiple registers in each loop step, which highlighted this problem, as different per-step-spill-sizes produced different results due to this non-determinism.

Edited by Kenneth Graunke

Merge request reports