radv: implement IB chaining for DGC when it's executed on compute
The IB2 packet is only supported on the graphics queue. To execute DGC IB on compute, the previous solution was to submit it separately without any chaining. Though this solution was incomplete because it's easy to reach the maximum number of IBs per submit when there is a lot of ExecuteIndirect() calls.
To fix that, the proposed solution is to implement DGC IB chaining when it's executed on the compute only. The idea is to add a trailer that is added at the beginning of the DGC IB (to know the offset). This trailer is used to chain back back the DGC IB to a normal CS, it's patched at execution time. Patching is fine because it's not allowed to execute the same DGC IB concurrently and the entire solution relies on that.
When the DGC IB is executed on graphics, the trailer isn't patched and it only contains NOPs padding. Performance should be mostly similar.
This fixes dEQP-VK.dgc.nv.compute.misc.execute_many_*_primary_cmd_compute_queue.
Based on !30768 (merged)