nir/schedule: use rb_tree to avoid linear searching.

It adds one rb_tree so when searching for a better node to schedule,
we could avoid a linear searching in some cases, and on those cases
that we rely on a linear search, improving that as the list is
ordered. At some point I experimented with having two rb trees
(ready/leader) but I didn't get a real improvement, and the code
became somewhat more complex.

Curiously using shader-db to compare with the linear searching, the
time got slightly worse on v3d:

Total CPU time (seconds): 8895.72 -> 8907.25 (0.13%)

That can be explained because the outcome are also slightly
different. Again focusing on spill/fills stats, and comparing the
linear searching with having the rb tree on v3d:

 total spills in shared programs: 1260 -> 1212 (-3.81%)
 spills in affected programs: 1136 -> 1088 (-4.23%)
 helped: 15
 HURT: 33

 total fills in shared programs: 2492 -> 2471 (-0.84%)
 fills in affected programs: 1892 -> 1871 (-1.11%)
 helped: 18
 HURT: 33

Having different outcome can be initially surprising. After a brief
checking, this is mostly due the cases where we have nodes with the
same maximum value of max_delay. For that case, the linear searching
return the first head with that repeated value, while the code using a
rb tree could chose a different one.

I was tempted to try to get the same outcome, or even investigate a
little further, but seemed somewhat unnecessary as the outcome is even
slightly better.

BTW, going back to the times, it is worth to note that even the linear
searching got the CPU time to go down when using the nir scheduler. So
comparing master and the nir scheduler with linear searching:

  Total CPU time (seconds): 9471.56 -> 8895.72 (-6.08%)

And comparing master with the nir scheduler with rb tree:

  Total CPU time (seconds): 9471.56 -> 8907.25 (-5.96%)
39 jobs for !1871 with review/nir-schedule-load-shared-support in 17 minutes and 41 seconds (queued for 3 seconds)
merge request