Ralloc GC contexts for NIR
The nature of the intermediate representations in Mesa like NIR and GLSL IR mean that we're frequently allocating and deallocating lots of small-ish similar-sized objects, something that the default glibc allocator is really bad at. Right now lock contention under malloc is really bad when switching from TGSI to NIR with radeonsi, so that for example when compiling the Dolphin ubershaders, NIR's single-threaded compilation time is faster but multithreaded wall time is slower for CPU's with many threads, with many more context switches and reduced parallelism -- and switching to jemalloc narrows the difference by a significant margin.
This MR doesn't entirely fix the problem, but it starts moving us in the right direction. The idea is to add a per-shader memory allocator in front of the system allocator which can be used to quickly allocate/deallocate instructions and facilitate reuse. This requires having one single context to handle all the memory allocated, which means that we can't support the way
nir_sweep steals objects to a different context. Instead, we introduce a "ralloc GC context," which is a self-contained context (no stealing/adopting from children of it) with a mark/sweep interface to handle
nir_sweep's needs, that under the hood manages memory for small objects itself, only requesting larger "slabs" using
malloc. We only use this context for instructions at the moment, since they're by far the most commonly allocated object, but other things could use them as well. Of course, we could also use this for GLSL IR, but I haven't looked into how hard that is.
At first I tried to write a brand new allocator instead of tacking this onto ralloc, but because of the way we nest objects in NIR, particularly for SSA def names, we'd have to either pretty much replicate ralloc or change a lot of code in order to use it. In this version there were very few changes to NIR passes, mostly to avoid some, err, "clever" (ab)use of ralloc contexts in a few passes.