radv: Experimental support for Mesh Shaders.

Merged Timur Kristóf requested to merge Venemo/mesa:radv-mesh-shader into main

Add experimental unofficial support for Mesh Shaders through the NV_mesh_shader extension.
Task Shaders are not supported yet, but will be coming soon.

Based on some NIR I/O fixes in MR !13466 (merged) and the RADV patch in MR !13440 (merged) as well.

What is a Mesh Shader?

Mesh shader is a new compute-like shader stage that can replace the entire vertex/geometry pipeline. Conceptually, instead of processing a pre-made set of vertices and primitives from a vertex buffer, mesh shaders can create vertices and primitives arbitrarily from any user-specified input.

When drawing with mesh shaders, the user needs to specify the number of launched mesh shader workgroups (instead of the number of vertices). It is then up to the shader to decide how many vertices and primitives it wants to create. More information about the rationale for mesh shaders can be found here.

Notes about NV_mesh_shader

Important note: NV_mesh_shader will never be officially supported on RADV, because it performs poorly on AMD hardware. However, we are implementing this extension to get some experience with mesh shader technology. Users should not rely on this support because we are going to remove it if/when a potential cross-vendor extension appears.

There are problems with the NV_mesh_shader extension which are not present in eg. D3D12:

  • The total number of output vertices is not known in runtime. D3D12 solves this with SetMeshOutputCounts which must appear before any outputs are written. NV_mesh_shader doesn't have this guarantee.
  • Any shader invocation can read the output of any other which is not possible in D3D12.
  • The NV indirect command buffer format is not supported by the hardware, so we have to emit several copy packets to make it work. Note that D3D12 uses 3D dispatches without an offset: (x, y, z) but NV_mesh_shader uses an 1D dispatch with offset: (taskCount, firstTask).

Therefore, NV_mesh_shader performs poorly compared to D3D12 mesh shaders.

Implementation details

Mesh shaders use NGG. We configure the shader to precisly control the number of launched workgroups. To achieve that, we set the input primitive topology to point list and set up the registers so that the shader takes 1 input vertex/primitive as "input". We use MAX_VERTS_PER_SUBGROUP and PRIM_AMP_FACTOR to specify the real amount of output vertices/primitives.

With that configuration, we can use the same draw packets as before, but now the number of input vertices can be used to specify the number of launched workgroups. From the shader's perspective, the "input vertex ID" can be used as the 1-dimensional workgroup ID.

Future work

  • VRS support for mesh shaders in MR !14193 (merged)
  • Mesh shader optimizations for the case when the local invocation index is used to address the output array.
  • Implement task shaders (these are currently missing).
Edited by Timur Kristóf

Merge request reports