Skip to content

WIP: aco: Add a post-RA scheduler to improve ALU scheduling

Timur Kristóf requested to merge Venemo/mesa:aco-prs into main

This MR adds a post-RA scheduler to ACO. The main goal is to improve ALU scheduling on Navi, this is not useful on older GPUs. I built this on top of some prior work done by Daniel, which was removed shortly before ACO was merged. Back then, the post-RA scheduler didn't really do anything. I took the old code, buffed it up so it can work and build a DAG, understand things like barriers, etc.

It works in the following way:

  • The post-RA scheduler (PRS) is a list scheduler.
  • Each basic block is processed independently, without any regard to each other.
  • Basic blocks are broken up to smaller units along scheduling barriers like control flow, s_barrier, and similar instructions.
  • Within the smaller units, a DAG (directed, acyclic graph) is built from the instructions based on the registers they read and write, and their memory semantics.
  • Each instruction is assigned a priority which is roughly based on its latency.
  • From the DAG, candidate instructions are selected based on priority and which of the available instructions can start first.

A few notes:

  • Scheduling memory instructions is not the primary goal, so currently it treats VMEM as scheduling barriers. There are some ideas to improve VMEM scheduling in the future, but this is a very problematic topic, and the pre-RA scheduler does a good job already.
  • It will still try to schedule SMEM and DS instructions, when it sees benefit in doing so.

There are also a couple of smaller fixes included here for other parts of ACO.

Edited by Timur Kristóf

Merge request reports