nir: add new SSA instruction scheduler grouping loads into indirection groups (+ radeonsi) (!13604) · Merge requests · Mesa / mesa

Marek Olšák requested to merge mareko/mesa:nir-group-loads into main Oct 30, 2021

The benefit is +9% performance in one viewperf subtest. (the best known case)

This is a new block-level load instruction scheduler where loads are grouped according to their indirection level within a basic block. An indirection is when a result of one load is used as a source of another load. The result is that disjoint ALU opcode groups and load (texture) opcode groups are created where each next load group is the next level of indirection. It's done by finding the first and last load with the same indirection level, and moving all unrelated instructions between them after the last load except for load sources, which are moved before the first load. It naturally suits hardware that has limits on texture indirections, but other hardware can benefit too. Only texture, image, and SSBO load and atomic instructions are grouped.

There is an option to group only those loads that use the same resource variable. This increases the chance to get more cache hits than if the loads were spread out.

The increased register usage is offset by the increase in observed memory bandwidth due to more cache hits (dependent on hw behavior) and thus decrease the subgroup lifetime, which allows registers to be deallocated and reused sooner. In bandwidth-bound cases, low register usage isn't always beneficial.

It's recommended to run a hw-specific instruction scheduler after this to tune register usage.

Edited Oct 30, 2021 by Marek Olšák

nir: add new SSA instruction scheduler grouping loads into indirection groups (+ radeonsi)

Merge request reports