nir: add a pass that reorders independent loads by the distance of their closest use (scheduling improvement for ACO)
ACO doesn't reorder loads in some cases, so you could get:
v0 = load();
v1 = load();
use(v1);
use(v0);
For instruction-level latency hiding, it's better to sort independent loads by the distance of their use:
v1 = load();
v0 = load();
use(v1);
use(v0);
With that, ACO can do better instruction scheduling and latency hiding. Loads using the same binding are kept together.
Most of this MR just cleans up nir_group_loads
to reuse some of its functions in the new pass. Then 1 commit adds the new pass.
nir_sort_instr
, a wrapper around qsort that sorts a range of instructions within a block, is added.