panfrost: do not push "true" UBOs
Panfrost supports pushing uniforms to hardware uniform registers (RMU/FAU for Midgard/Bifrost respectively). Since OpenGL uniforms are lowered to UBO #0, it does this with a pass that pushes UBOs. That's good!
The pass also pushes 'true' OpenGL UBOs, since they look the same in the backend at this point. This is where the trouble comes in:
-
True UBOs are allocated in GPU BOs, not CPU allocated buffers. That means it's write-combine memory, which we cannot read from efficiently (at least depending on coherency details that were never plumbed through panfrost.ko and unlikely to be replumbed now that panthor is the new hot stuff). So, pushing true UBOs reduces GPU overhead at the cost of tremendous CPU overhead. This is dubious... When I benchmarked this on MT8192 in early 2023, this pushing improved FPS in SuperTuxKart but hurt FPS in Dolphin.
-
True UBOs can be written on the GPU. In OpenGL, we have batch tracking infrastructure to sort this mess out in theory. What this means is that pushing UBOs requires us to flush writers AND STALL at draw-time. If this is ever hit, our performance is utterly trashed. But it gets worse.
-
True UBOs can be written in the same batch that reads them. For example, we could bind a buffer as a transform feedback buffer, do a draw with XFB, then rebind as a UBO and do a draw reading. This is where we collapse -- our logic will flush the writer, which is the same batch we were in the middle of enqueueing a draw to. When we try to push words, we'll crash with theatrics. This could be solved by smartening the batch tracking logic but it's not trivial by any means.
So, pushing true UBOs on the CPU is broken and can hurt performance. Stop doing it!
Long term, the solution will be to push on the GPU instead. This avoids all of these issues. This can be done with a compute kernel or with CSF instructions. The Vulkan driver will likely have to do this for performance, since pushing UBOs from the CPU is utterly broken in Vulkan for the above reasons.
I have a branch somewhere doing this on v9 but I'm doing this on NIR time to unblock a core change that was crashing piglit due to this pile of unsoundness. Let's fix the correctness issues first, then someone can look at recovering performance later when we're not blocking unrelated work.
Fixes corruption in Piglit test gles-3.0-transform-feedback-uniform-buffer-object, which writes a UBO with transform feedback. (I suspect the test still doesn't pass for the same reason it's broken on other tilers. But that's a better place to be than oodles of memory corruption.)