Skip to content

aco: Compile tessellation control and tessellation evaluation shaders

Timur Kristóf requested to merge Venemo/mesa:aco-tess into master

This MR gives ACO the ability to compile tessellation control and evaluation shaders in all variations. The hard part was getting the TCS I/O right.

What's there:

  • Refactored shader I/O (especially for per-vertex inputs, which affects the existing geometry shader implementation)
  • New helper functions for calculating address offsets, in order to reduce the brain twister nature of TCS I/O
  • A few simple optimizations for merged shaders
  • Enable all stages with ACO in RADV
  • Tessellation Control Shader support:
    • GFX6-8: API VS runs on the HW LS stage: vertex_ls, and API TCS runs on HW HS: tess_control_hs
    • GFX9+: API VS and TCS are merged and run on the HW HS stage: vertex_tess_control_hs
    • In both cases, the VS and TCS share the same LDS space, so the API VS (HW LS) outputs are stored in the LDS. The TCS then reads these from the LDS as its inputs.
    • TCS outputs are passed to the TES in VMEM. Since the TCS can read its own outputs, these are also stored in LDS so they can be accessed fast by other TCS invocations. The tessellation factors are only stored in LDS, and at the end of the shader they are stored in their own ring in VMEM.
    • The LDS and VMEM layouts match those currently employed by radv_nir_to_llvm for convenience.
    • Fix LS VGPRs on hardware that is buggy (Vega 10 and Raven)
  • Tessellation Evaluation Shader support:
    • GFX6-10: When tessellation is used but there is no GS, the TES runs on the HW VS stage: tess_eval_vs
    • GFX6-8: When tessellation and GS are used together, the TES runs on the HW ES stage: tess_eval_es
    • GFX9+: TES and GS are merged an run on the HW GS stage: tess_eval_geometry_gs
    • When tessellation is used, the TES outputs conceptually replace the VS outputs, as far as further HW stages are concerned, which means that much of the existing VS code could be reused.

Additional nice to haves, these will be addressed in a future MR:

  • Better optimization for address offset calculations (separation of SGPR offset when it is added to the VGPR offset; combine some adds to MAD)
  • Don't initialize m0 for LDS on GFX9+, unless we have to
  • Combine some VMEM loads and/or stores - could be partly done in NIR
  • Combine LDS instructions - even unrelated ones could be easily combined
  • Remove some TCS output stores when they are unneeded (eg. TCS output is not read by TES)
  • Reduce LDS space used by vertex_tess_control_hs if possible
  • Share the workgroup size calculation with other passes in ACO (raises some issues about merged shaders)

Testing:

  • Tested on GFX10 (RX 5700 XT) with the CTS, the Sascha demo apps and a few games, eg. RotTR, SotTR and TW3. I haven't noticed any glitches, or any performance regressions compared to how these games performed earlier.
  • Tested on GFX8 (RX 570 ITX) with the CTS and Sascha demo apps
  • Tested on GFX7 (R7 260X) with the CTS, Sascha demo apps and RotTR
  • Tested on GFX6 with the CTS

CTS test results for dEQP-VK.tessellation.*:

Test run totals:
  Passed:        432/432 (100.0%)
  Failed:        0/432 (0.0%)
  Not supported: 0/432 (0.0%)
  Warnings:      0/432 (0.0%)
Edited by Timur Kristóf

Merge request reports