Draft: v3dv: enable double-buffer mode automatically
Double-buffer mode can help performance in some workloads but hurt others depending on the balance of cost of tile load/store compared to vertex processing in a job. If the cost of vertex processing in the pipeline is small enough, then reducing the tile size to enable double-buffer mode can be helpful. This series tries to come up with a simple heuristic to activate this mode automatically by tracking this cost across all draw calls in a job.
We can't make the decision about this until we have recorded all the calls and we are ready to "finish" the job, which doesn't play well with the fact that the hardware requires that the first command in the binning list is precisely the one that makes this decision. In this series the approach we take is to save a pointer to the place in the BCL where we stored that command and rewrite it later if we decide that we want to enable double-buffer.
Another issue is that changing the tile size to enable double-buffer has a few implications. One of them is that we need to make sure our tile state/alloc buffers are large enough for this since with a smaller tile size we'll have more tiles. Again, this is something we have been doing up-front until now and doesn't play well with making this decision late. While we could simply reallocate these BOs if we then decide to enable double-buffer later on (or even always allocate for double-buffer in advance), in this series I taking the approach of postponing tile state allocations for render pass jobs (for now the only ones where we may decide to use double-buffer) until we have made a decision about double-buffer, which requires a bit more refactoring. The other issue is that there are some optimal paths in the driver that depend on whether the render area is aligned to the tile boundaries and these decision may change with a smaller tile size but since we can't decide about double-buffer until we have recorded all the commands, we may not always be able to make the most optimal decisions for double-buffer mode. Particularly, there may be cases where the normal tile size is not aligned but the smaller tile size we choose for double-buffer is. In any case, this would not be worse than the non-double-buffer mode in any scenario.
The heuristics are based on the Quake, sponza and UE4 samples we usually use for testing. My results with this series with its current heuristic shows better or same performance in all samples.
Marked as DRAFT for now, since this still needs some cleaning-up.