intel: Make register allocation faster
This MR significantly improves the performance of RA on Intel. It doesn't substantially change how much we spill but it does make things significantly faster. The approach taken comes in a few steps:
- Improve the performance of the core RA algorithm shared by the Intel drivers and a few others in mesa.
- Get rid of extra RA calls when we spill or bail due to failing allocation for a high SIMD width programs.
- Improve the RA api so we can call
ra_allocate()
multiple times and mutate the graph in between. - Rework tine internals
fs_visitor::assign_regs()
and break up interference graph building into more re-usable pieces. - Modify
fs_visitor::assign_regs()
to modify the interference graph as part of spilling rather than throwing the whole thing away, liveness and all.
The end result of all this is an over-all 10% reduction in shader-db runtime.
total instructions in shared programs: 15311100 -> 15311360 (<.01%)
instructions in affected programs: 88901 -> 89161 (0.29%)
helped: 11
HURT: 21
total cycles in shared programs: 355468050 -> 355830749 (0.10%)
cycles in affected programs: 205180904 -> 205543603 (0.18%)
helped: 246
HURT: 209
total loops in shared programs: 4360 -> 4360 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 12036 -> 12042 (0.05%)
spills in affected programs: 2588 -> 2594 (0.23%)
helped: 9
HURT: 19
total fills in shared programs: 25088 -> 25165 (0.31%)
fills in affected programs: 7179 -> 7256 (1.07%)
helped: 11
HURT: 19
LOST: 0
GAINED: 0
Total CPU time (seconds): 2611.35 -> 2360.22 (-9.62%)