nir: add no_varying & no_sysval_output IO semantic flags, transform feedback info in IO intrinsics; fixes (!14388) · Merge requests · Mesa / mesa

Marek Olšák requested to merge mareko/mesa:nir-xfb-new into main Jan 04, 2022

This improves the expressive power of the store_output intrinsic to match more closely what we do in LLVM IR.

When we write into an output in NIR, we want to know whether the consumer is the next shader stage, or transform feedback, or fixed-func hw (e.g. with CLIPDISTn), or any combination of those, but not all of them. This MR solves that by adding:

Transform feedback information into store_output, which means store_output will contain the transform feedback buffer, offset, and writemask for each component.
Two new flags into io_semantics: no_varying and no_sysval_output.

When no_varying is set, the next stage doesn't consume the output (but transform feedback or the fixed-func hw can). When no_sysval_output is set, the fixed-func hw (e.g. the clipper) doesn't consume the output (but the next stage or transform feedback can). Similar for transform feedback.

With the new transform feedback information in store_output, the intrinsic becomes a pure transform feedback store, which decouples transform feedback from gl_varying_slot. What this means is that a linker can move transform feedback output components between slots and pack them arbitrarily as long as it moves per-component transform feedback buffer/offset/writemask with it. This effectively replaces pipe_stream_output_info.

Finally, nir_lower_io is called by st/mesa instead of drivers (conditional on a new NIR option), which we need to do to embed the transform feedback info in store_output.

With all the features above, we can write a much better version of nir_link_opt_varyings and nir_compact_varyings. There is also a desire to write those on top of lowered IO because it would be simpler and it could do more that way. The current linking helpers are missing a lot of things. I list them below.

nir_link_opt_varyings doesn't optimize the following:

VS->TCS
TCS->TES
VS->TES (GL only, algorithm equivalent to VS->TCS)
VS->GS
TES->GS (algorithm equivalent to VS->GS)
GS->FS
arrays
structures
interfaces
dual slots
matrices (are matrices scalarized?)
patch varyings
legacy and sysval GL outputs when GL_PROGRAM_SEPARABLE == FALSE (see Non-separable shader notes)

nir_compact_varyings doesn't pack the following:

non-32-bit types
arrays
structures
interfaces
matrices (are matrices scalarized?)
legacy and sysval GL outputs when GL_PROGRAM_SEPARABLE == FALSE (see Non-separable shader notes)

Non-separable shader notes:

TEXn can't be optimized and packed only when the next stage is FS and TEXn can be overridden by sprite_coord_enable.
COL1 can only be packed into COL0 if the next stage is FS and glShadeModel and color-two-side enablement is unknown.
CLIPDISTn, LAYER, and VIEWPORT sysval outputs can be constant-folded into FS but can't be removed. They can also be packed if the values passed to FS are moved to varying-only slots. TESS_LEVEL_* sysval outputs can get the same treatment as CLIPDISTn.
Any legacy and sysval output can be packed with any other output if it's never a sysval output in that stage, or isn't affected by external states (e.g. TEXn and COLn before FS are affected).

nir: add no_varying & no_sysval_output IO semantic flags, transform feedback info in IO intrinsics; fixes

Merge request reports