Skip to content

nir,agx: Lower system values in NIR in the driver

Alyssa Rosenzweig requested to merge alyssa/mesa:agx/lower-sysvals-in-nir into main

AGX has a large number of "uniform registers" available. These may be loaded with arbitrary ranges of GPU memory by the driver, or they can be written by the preamble shader. Currently, the compiler runs nir_opt_preamble on the first half of the uniform file, and then translates NIR sysvals to moves from the second half of the uniform file, passing back a uniform->sysval map for the GL driver to respect. This has (at least) two issues:

  • Since nir_opt_preamble runs before gathering sysvals, it has to assume the maximum number of sysvals are pushed, which can prevent it from moving some computation to the preamble due to running out of partitioned uniform registers. This is a problem for Dolphin's ubershaders, though it's unclear how much it matters for Dolphin perf.
  • This violates The Ekstrand Rule and apparently will be a problem for our Vulkan driver. I'm just a compiler+GL girl, so I wouldn't know.

To fix this, we invert the order of operations. At the end of this series, we instead lower NIR system values to NIR load_preamble instructions in the GL driver. The compiler just translates directly to uniform registers reads. The Vulkan driver will need its own version of this code, but maybe it can do something clever and descriptor set aware.

This means that there will already be some load_preamble instructions when nir_opt_preamble runs, so I've made minor changes to nir_opt_preamble to handle that gracefully. This is a bit lazy... The alternative is to introduce a load_uniform_agx intrinsic which load_preamble gets lowered to trivially. But that's another pass over the IR (and due to AGX's shader variant hell I'm sensitive to backend compile time) and it would be more complicated than what's implemented here.

Cc @cwabbott0 -- if you're ok with the slight generalization of load_preamble, please review/ack the nir_opt_preamble patches here, otherwise if you feel strongly, I can introduce a load_uniform_agx and offset the load_preamble::base appropriately when lowering load_preamble -> load_uniform_agx in the backend.

Cc @asahilina for compiler and GL driver changes.

Cc @Ella-0 for the implications for AGXV.

Merge request reports