WIP: aco: Add a post-RA scheduler to improve ALU scheduling (!7272) · Merge requests · Mesa / mesa

Timur Kristóf requested to merge Venemo/mesa:aco-prs into main Oct 22, 2020

This MR adds a post-RA scheduler to ACO. The main goal is to improve ALU scheduling on Navi, this is not useful on older GPUs. I built this on top of some prior work done by Daniel, which was removed shortly before ACO was merged. Back then, the post-RA scheduler didn't really do anything. I took the old code, buffed it up so it can work and build a DAG, understand things like barriers, etc.

It works in the following way:

The post-RA scheduler (PRS) is a list scheduler.
Each basic block is processed independently, without any regard to each other.
Basic blocks are broken up to smaller units along scheduling barriers like control flow, s_barrier, and similar instructions.
Within the smaller units, a DAG (directed, acyclic graph) is built from the instructions based on the registers they read and write, and their memory semantics.
Each instruction is assigned a priority which is roughly based on its latency.
From the DAG, candidate instructions are selected based on priority and which of the available instructions can start first.

A few notes:

Scheduling memory instructions is not the primary goal, so currently it treats VMEM as scheduling barriers. There are some ideas to improve VMEM scheduling in the future, but this is a very problematic topic, and the pre-RA scheduler does a good job already.
It will still try to schedule SMEM and DS instructions, when it sees benefit in doing so.

There are also a couple of smaller fixes included here for other parts of ACO.

Edited Oct 27, 2020 by Timur Kristóf

Admin message

WIP: aco: Add a post-RA scheduler to improve ALU scheduling

Merge request reports