st/nir: Re-vectorize shader IO
We scalarize IO to enable further optimizations, such as propagating constant components across shaders, eliminating dead components, and so on. This patch attempts to re-vectorize those operations after the varying optimizations are done.
Intel GPUs are a scalar architecture, but IO operations work on whole vec4's at a time, so we'd prefer to have a single IO load per vector rather than 4 scalar IO loads. This re-vectorization can help a lot.
This may or may not be useful for other GPUs. Ultimately, the same work is being performed either way, so assuming a backend can emit four scalar operations for a vec4 IO intrinsic, it's probably not that harmful to opportunistically revectorize...
On Skylake GT4e with iris, improves performance of GFXBench5's gl_tess benchmark by 12.1418% +/- 0.159919% (n=3) by eliminating all spilling in geometry shaders.