This MR contains the unreviewed commits from !14929 (merged) - draw calls with task shaders.
The main difficulty here is that the HW implementation mismatches the API:
- In the API, task shaders are considered graphics shaders which are part of a graphics pipeline and the draws are submitted to a graphics queue.
- The HW requires the driver to dispatch task shaders on an async compute queue. Mesh and pixel shaders are still on the graphics queue. This means that mesh shader execution (of already finished tasks) can overlap with task shader execution.
- For task+mesh draw calls, we create an internal compute cmdbuf and use that to emit the task shader dispatch commands
- At queue submit, the internal compute command buffer is submitted to an async compute queue
- We use scheduled dependencies now, which is far from perfect, eventually we'll switch this to use gang submit instead