nir: add no_varying & no_sysval_output IO semantic flags, transform feedback info in IO intrinsics; fixes
This improves the expressive power of the store_output
intrinsic to match more closely what we do in LLVM IR.
When we write into an output in NIR, we want to know whether the consumer is the next shader stage, or transform feedback, or fixed-func hw (e.g. with CLIPDISTn
), or any combination of those, but not all of them. This MR solves that by adding:
- Transform feedback information into
store_output
, which meansstore_output
will contain the transform feedback buffer, offset, and writemask for each component. - Two new flags into
io_semantics
:no_varying
andno_sysval_output
.
When no_varying
is set, the next stage doesn't consume the output (but transform feedback or the fixed-func hw can). When no_sysval_output
is set, the fixed-func hw (e.g. the clipper) doesn't consume the output (but the next stage or transform feedback can). Similar for transform feedback.
With the new transform feedback information in store_output
, the intrinsic becomes a pure transform feedback store, which decouples transform feedback from gl_varying_slot
. What this means is that a linker can move transform feedback output components between slots and pack them arbitrarily as long as it moves per-component transform feedback buffer/offset/writemask with it. This effectively replaces pipe_stream_output_info
.
Finally, nir_lower_io
is called by st/mesa instead of drivers (conditional on a new NIR option), which we need to do to embed the transform feedback info in store_output
.
With all the features above, we can write a much better version of nir_link_opt_varyings
and nir_compact_varyings
. There is also a desire to write those on top of lowered IO because it would be simpler and it could do more that way. The current linking helpers are missing a lot of things. I list them below.
nir_link_opt_varyings
doesn't optimize the following:
- VS->TCS
- TCS->TES
- VS->TES (GL only, algorithm equivalent to VS->TCS)
- VS->GS
- TES->GS (algorithm equivalent to VS->GS)
- GS->FS
- arrays
- structures
- interfaces
- dual slots
- matrices (are matrices scalarized?)
- patch varyings
- legacy and sysval GL outputs when
GL_PROGRAM_SEPARABLE
==FALSE
(see Non-separable shader notes)
nir_compact_varyings
doesn't pack the following:
- non-32-bit types
- arrays
- structures
- interfaces
- matrices (are matrices scalarized?)
- legacy and sysval GL outputs when
GL_PROGRAM_SEPARABLE
==FALSE
(see Non-separable shader notes)
Non-separable shader notes:
-
TEXn
can't be optimized and packed only when the next stage is FS andTEXn
can be overridden bysprite_coord_enable
. -
COL1
can only be packed intoCOL0
if the next stage is FS andglShadeModel
and color-two-side enablement is unknown. -
CLIPDISTn
,LAYER
, andVIEWPORT
sysval outputs can be constant-folded into FS but can't be removed. They can also be packed if the values passed to FS are moved to varying-only slots.TESS_LEVEL_*
sysval outputs can get the same treatment asCLIPDISTn
. - Any legacy and sysval output can be packed with any other output if it's never a sysval output in that stage, or isn't affected by external states (e.g.
TEXn
andCOLn
before FS are affected).