Skip to content

nir,glsl: add nir_opt_varyings, new varying linking optimization pass, enable it in the linker for radeonsi

Marek Olšák requested to merge mareko/mesa:nir-opt-varyings into main

The first 6 commits moved to !26918 (merged).

This adds:

  • nir_vertex_divergence_analysis (via !26918 (merged)) - reuses nir_divergence_analysis, but computes divergence within a primitive instead of within a subgroup; used by nir_opt_varyings
  • Computation and queries of a post-dominator tree of SSA uses - same algorithm as nir_dominance.c, but the graph is defined differently and it computes post-dominance instead of dominance; it's explained in the source file; used by nir_opt_varyings for backward inter-shader code motion
  • nir_opt_varyings and a lot of NIR tests for it
  • radeonsi preparation changes for nir_opt_varyings (it must be done before enabling it in the GLSL linker)
  • st/mesa and GLSL linker changes to enable nir_opt_varyings if lower_io_variables is true; this only enables it for radeonsi because no other driver sets that option
    • Total: 35 files changed, 8478 insertions(+), 107 deletions(-)

Optimizations performed by nir_opt_varyings:

  • Dead input/output removal
  • Propagation of constants, uniforms, UBO loads, and ALU expressions that use them from shader outputs to later shaders
  • Output deduplication
  • Backward inter-shader code motion (it moves code from the consumer to the producer)
  • Compaction
    • The optimizations are pretty thorough and support all shaders except backward inter-shader code motion, which only handles a subset of shaders (e.g. TCS->TES, VS->FS, TES->FS)

The complete description of the behavior of nir_opt_varyings is in the source file and here: #8841

This uncovers incorrect expectations in dEQP and GLCTS tests described here: #10361

STATS FOR AFFECTED SHADERS (16009/58918)                (AMD terminology)
  TCS inputs:                 475 -> 379.00 (-20.21 %)  (= LS outputs)
  TES inputs:                 478 -> 366 (-23.43 %)     (= HS outputs)
  TES patch inputs:           234 -> 232 (-0.85 %)      (= HS patch outputs)
  GS inputs:                  168 -> 115 (-31.55 %)     (= ES outputs)
  FS inputs after GS:         67 -> 61 (-8.96 %)        (= GS param exports)
  FS inputs after VS and TES: 31988 -> 28495 (-10.92 %) (= VS/TES param exports)
  Code Size:                  24606160 -> 24320676 (-1.16 %) bytes
  Max Waves:                  242634 -> 243876 (0.51 %)

I also noticed that GLCTS finished 30% faster with this on Radeon 7600, probably because the pass moves a lot of code from FS to VS (including slow FP64 code) due to how the tests are written.

Edited by Marek Olšák

Merge request reports