Do we want Vulkan runtime support for shader-based DGC?
VK_NV_dgc seems to act as a very overloaded DrawIndirect, that I think drivers turn into an internal (software defined) command list to be consumed by an internal compute kernel emitting packets/etc. This "fixed function" approach contrasts sharply with how the equivalent functionality works in another graphics API, where there is minimal added API surface but instead the shading language is augmented with functions to emit draws/state changes into a command buffer. This seems a bit friendlier to the driver, since all of the code to implement those standard library routines is already internally required for the NV-version.
This raises the question: should we add general NIR intrinsics for emitting commands into a command buffer (emit_draw
/emit_set_vertex_buffer
/...), and then implement DGC on top in the Vulkan runtime? This would seem possible by doing a bit of code motion from RADV, but with only one hardware driver implementing the ext in-tree so far it's hard for me to tell exactly.
Does NVIDIA hardware do something more clever with MME perhaps, than what RADV is doing in the big compute kernel? Is that how we got here?
I guess there are different tiers of hardware here... If I think about Mali, DGC on JM-era hardware would be implemented with the RADV approach. For that hardware, I would want this runtime support and then just implement the intrinsics in panvk. But for CSF hardware, it might be possible to implement NV_dgc directly with the CSF command streamer, avoiding the compute kernel. In that case, this runtime support would not be helpful, since there wouldn't be shader code involved (maybe). That may or may not be faster in practice, lol.