In the end I still needed a big lock around
render_tiles(), so I suppose splitting out and caching the GMEM state wasn't strictly needed. But it was kinda a thing I'd wanted to do for a while.
This seems to help keep the GPU utilization during manhattan (mostly) at 100%. I suspect with the async flush-queue we were probably hitting some lock contention (
struct_mutex on kernel side, between
SUBMIT ioctl and
MADVISE and/or allocation?)