lima/gpir: Rewrite register allocation for value registers

The usual linear-scan register allocation algorithm can't handle
preallocated registers, since we might be forced to choose a color for
a non-preallocated variable that overlaps with a pre-allocated variable.
But in such cases we can simply split the live range of the offending
variable when we reach the beginning of the pre-allocated variable's
live range. This is still optimal in the sense that it always finds a
coloring whenever one is possible, but we may not insert the smallest
possible number of moves. However, since it's actually the scheduler
which splits live ranges afterwards, we can simply fold in the move
while keeping its fake dependencies, and then everything still works! In
other words, inserting a live range split for a value register during
register allocation is pretty much free.

This means that we can split register allocation in two. First globally
allocate the cross-block registers accessed through load_reg and
store_reg instructions, which is still done via graph coloring, and then
run a linear scan algorithm over each block, treating the load_reg and
store_reg nodes as referring to pre-allocated registers. This makes the
existing RA more complicated, but it has two benefits: first, using
round-robin with the linear scan allocator results in much fewer fake
dependencies, resulting in around 15 less instructions in the glmark2
jellyfish shader and fixing a regression in instruction count since
branching support went in. Second, it will simplify handling spilling.
With just graph coloring for everything, every time we spill a node, we
have to create new value registers which become new nodes in the graph
and re-run RA. This is worsened by the fact that when writing a value to
a temporary, we need to have an extra register available to load the
write address with a load_const node. With the new scheme, we can ignore
this entirely in the first part and then in the second part we can just
reserve an extra register in sections where we know we have to spill. So
no re-running RA many times, and we can get a good result quickly.

The current implementation does linear scan backwards, so that we can
insert the fake dependencies while allocating and avoid creating any
move nodes at all when we have to split a live range. However, it turns
out that this makes handling schedule_first nodes a bit more
complicated, so it's not clear if that was worth it.
48 jobs for !2315 with review/lima-gpir-branch-opt in 17 minutes and 53 seconds (queued for 3 seconds)
detached
Status Job ID Name Coverage
  Containers
passed #779557
debian-10

00:00:25

passed #779558
debian-9

00:00:23

passed #779556
lava-container:arm64

00:00:25

passed #779555
lava-container:armhf

00:00:23

passed #779559
aarch64
test-container:arm64

00:00:19

 
  Build
passed #779561
lava-build:arm64

00:04:27

passed #779560
lava-build:armhf

00:04:48

passed #779564
meson-arm64

00:02:22

passed #779563
meson-armhf

00:03:39

passed #779566
meson-clang

00:03:42

passed #779569
meson-clover

00:06:00

passed #779570
meson-clover-old-llvm

00:05:53

passed #779572
meson-i386

00:05:03

passed #779562
meson-main

00:11:25

passed #779565
meson-swr-glvnd

00:05:24

passed #779571
meson-vulkan

00:01:12

passed #779573
scons

00:10:17

passed #779574
scons-old-llvm

00:08:22

passed #779567
scons-swr

00:08:26

passed #779568
scons-win64

00:04:32

 
  Test
passed #779597
db410c
arm64_a306_gles2 1/4

00:14:06

passed #779598
db410c
arm64_a306_gles2 2/4

00:14:57

passed #779617
db410c
arm64_a306_gles2 3/4

00:04:31

passed #779618
db410c
arm64_a306_gles2 4/4

00:04:42

passed #779586
mesa-cheza
arm64_a630_gles2

00:04:07

passed #779587
mesa-cheza
arm64_a630_gles31 1/4

00:03:47

passed #779588
mesa-cheza
arm64_a630_gles31 2/4

00:04:56

passed #779589
mesa-cheza
arm64_a630_gles31 3/4

00:04:31

passed #779590
mesa-cheza
arm64_a630_gles31 4/4

00:04:59

passed #779591
mesa-cheza
arm64_a630_gles3 1/6

00:03:37

passed #779592
mesa-cheza
arm64_a630_gles3 2/6

00:03:23

passed #779593
mesa-cheza
arm64_a630_gles3 3/6

00:03:32

passed #779594
mesa-cheza
arm64_a630_gles3 4/6

00:03:25

passed #779595
mesa-cheza
arm64_a630_gles3 5/6

00:03:16

passed #779596
mesa-cheza
arm64_a630_gles3 6/6

00:03:49

passed #779575
lava-rk3288-veyron-jaq
panfrost-t760-test:armhf

00:06:02

passed #779576
lava-rk3399-gru-kevin
panfrost-t860-test:arm64

00:05:43

passed #779577
test-llvmpipe-gles2 1/4

00:03:13

passed #779578
test-llvmpipe-gles2 2/4

00:03:28

passed #779579
test-llvmpipe-gles2 3/4

00:02:26

passed #779580
test-llvmpipe-gles2 4/4

00:03:44

passed #779581
test-softpipe-gles2 1/4

00:01:28

passed #779582
test-softpipe-gles2 2/4

00:01:38

passed #779583
test-softpipe-gles2 3/4

00:01:40

passed #779584
test-softpipe-gles2 4/4

00:01:28

passed #779585
test-softpipe-gles3-limited

00:03:49

failed #779599
db410c
arm64_a306_gles2 3/4

00:09:55

failed #779600
db410c
arm64_a306_gles2 4/4

00:09:55