freedreno/a6xx: CP overhead reductions

Rob Clark requested to merge robclark/mesa:wip/cp-overhead into master

For things that have a large number of small draws, in particular draws that only effect a subset of tiles, moving more register writes to stateobjs and reducing the register writes we emit directly to the draw ring helps drop CP overhead. In particular, webgl fishtank, at 1000 fish, with sharks+lasers, which is a bit of a pedantic case) on a small a6xx (less gmem, more tiles, and more tiles per VSC pipe), this MR takes us from:

metric master MR
fps 49 60
CP_BUSY_GFX_IDLE 26-28% 18.5%
CP_BUSY_CYCLES 87% 73.5%

Leaving this as WIP for now, there are probably a few other groups of registers we could move to stateobjs

Edited by Rob Clark

Merge request reports