nir: trim unused ends of ssa def vectors, replacing nir_opt_shrink_load, adding to i965/vec4.
In fixing nir-to-tgsi codegen regressions, I noticed that we weren't trimming the unused ends of vectors in NIR. Putting together a pass to do so was quick, and feels like something that NIR should have. Looks like a nice little win on i965/vec4 (tested in intel CI, only produced tgl fails that I assume are intermittents)
We should also swizzle out unused start or middle channels of vectors when possible, but that's harder to do, so leave that for future work.