mesa,glthread: lots of cleanups, some CPU overhead improvements
I get 15% better perf with this in one viewperf subtest that is CPU-bound.
This is mostly just cleanups though. Lots of cleanups. This MR is a prerequisite for multi-mode multidraws, which we may have to do to reduce frontend overhead further. It's also a prerequisite for a glthread dispatch rewrite, which I'm still considering.