Cuts down the time required to compile aco_instruction_selection.cpp by about 13%, and happens to reduce libaco's size by 150 KiB (6.5%) along the way. Shader compile-time is unaffected (verified using radv_fossils).
This was mostly low-hanging fruit; gcc spends a lot of time optimizing the large
nir_intrinsic_* switches, which I haven't found an effective way to build faster (but clang appears to struggle less here).
Best reviewed per-commit. Note that 42f4d14b only moves around things (as verified by
git-diff showing almost only moved lines).