Skip to content

intel: Make register allocation faster

This MR significantly improves the performance of RA on Intel. It doesn't substantially change how much we spill but it does make things significantly faster. The approach taken comes in a few steps:

  1. Improve the performance of the core RA algorithm shared by the Intel drivers and a few others in mesa.
  2. Get rid of extra RA calls when we spill or bail due to failing allocation for a high SIMD width programs.
  3. Improve the RA api so we can call ra_allocate() multiple times and mutate the graph in between.
  4. Rework tine internals fs_visitor::assign_regs() and break up interference graph building into more re-usable pieces.
  5. Modify fs_visitor::assign_regs() to modify the interference graph as part of spilling rather than throwing the whole thing away, liveness and all.

The end result of all this is an over-all 10% reduction in shader-db runtime.

total instructions in shared programs: 15311100 -> 15311360 (<.01%)
instructions in affected programs: 88901 -> 89161 (0.29%)
helped: 11
HURT: 21

total cycles in shared programs: 355468050 -> 355830749 (0.10%)
cycles in affected programs: 205180904 -> 205543603 (0.18%)
helped: 246
HURT: 209

total loops in shared programs: 4360 -> 4360 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 12036 -> 12042 (0.05%)
spills in affected programs: 2588 -> 2594 (0.23%)
helped: 9
HURT: 19

total fills in shared programs: 25088 -> 25165 (0.31%)
fills in affected programs: 7179 -> 7256 (1.07%)
helped: 11
HURT: 19

LOST:   0
GAINED: 0

Total CPU time (seconds): 2611.35 -> 2360.22 (-9.62%)

Merge request reports

Loading