implement clear_buffer for A6xx and A7xx
Increase performance of
- OpenGL ARB_clear_buffer_object impl.
- OpenCL clEnqueueFillBuffer implementation
Dependency for: !25840 (closed)
Benchmarks (mesa 24.1.2 + Freedreno Rusticl patches, Tinygrad LLM):
- 950ms per token without
- 550ms per token with this clear_buffer done by GPU
Relevant tests:
- GL-CTS:
KHR-GL46.direct_state_access.buffers_clear
- piglit GL:
arb_clear_buffer_object-.*
- piglit CL (with rusticl):
cl-api-enqueue-fill-buffer