Skip to content

aco/scheduler: Fix erroneous register demand tracking

Tony Wasserka requested to merge neobrain/mesa:issue_4533_scheduler into main

Scheduler bookkeeping has been computing slightly incorrect register demands, which could lead to less-than-optimal scheduling decisions or overstepping the available register space. This MR adds extensive verification of internal invariants and fixes all corner cases uncovered by it.

Initially this MR only addressed one specific of these cases. Here's the original MR overview, which provides useful insight into how this caused issue #4533 (closed):

total_demand is the maximum RegisterDemand used for instructions up to the beginning of the next VMEM clause, but it wasn't updated correctly when moving a VMEM instruction downwards to the beginning of the next VMEM clause. In effect, the scheduler could produce a program requiring more registers than available (see #4533 (closed)).

For instance, starting from this:

Instr 637:   v3: %68587 = p_create_vector %68553, %68555, %66549
Instr 638:   v1: %68588 = image_load %67896,  s4: undef,  v1: undef, %68587 1d unrm da
Instr 639:   v3: %68611 = p_create_vector %68503, %68555, %66549
Instr 640:   v1: %68612 = image_load %67896,  s4: undef,  v1: undef, %68611 1d unrm da

Swapping instructions 638 and 639 would leave total_demand unmodified even though p_create_vector can clearly lead to an increase in demand. Prior to this change, total_demand.vgp was 231 after swapping, but if you annotate each instruction with its local register demand (computed by live_var_analysis) you get this:

Instr 637:  v3: %68587 = p_create_vector %68553, %68555, %66549  | 99 sgprs, 230 vgprs
Instr 638:  v3: %68611 = p_create_vector %68503, %68555, %66549  | 99 sgprs, 233 vgprs
Instr 639:  v1: %68588 = image_load %67896,  s4: undef,  v1: undef, %68587 1d unrm da | 99 sgprs, 231 vgprs
Instr 640:  v1: %68612 = image_load %67896,  s4: undef,  v1: undef, %68611 1d unrm da | 99 sgprs, 229 vgprs

With this change, total_demand.vgpr is now correctly computed to 233.

Totals from 1909 (1.27% of 149974) affected shaders:
VGPRs: 119632 -> 119520 (-0.09%); split: -0.11%, +0.01%
CodeSize: 10832816 -> 10817420 (-0.14%); split: -0.23%, +0.09%
MaxWaves: 35174 -> 35250 (+0.22%); split: +0.24%, -0.02%
Instrs: 2041183 -> 2039069 (-0.10%); split: -0.20%, +0.09%
Latency: 28902824 -> 28575320 (-1.13%); split: -1.28%, +0.14%
InvThroughput: 8839996 -> 8686724 (-1.73%); split: -1.75%, +0.02%
VClause: 36852 -> 36380 (-1.28%); split: -1.64%, +0.36%
SClause: 95682 -> 94545 (-1.19%); split: -1.70%, +0.51%
Copies: 102810 -> 99247 (-3.47%); split: -3.98%, +0.51%
Branches: 21465 -> 21472 (+0.03%); split: -0.01%, +0.05%
Edited by Tony Wasserka

Merge request reports