i965/nir: use vectorization for non-scalar stages

Shader-db results on Haswell:

    total instructions in shared programs: 2180337 -> 2154080 (-1.20%)
    instructions in affected programs: 959766 -> 933509 (-2.74%)
    helped: 5653
    HURT: 2560

    total cycles in shared programs: 12339326 -> 12307102 (-0.26%)
    cycles in affected programs: 6102794 -> 6070570 (-0.53%)
    helped: 3838
    HURT: 4868

Most of the hurt programs seem to be because we generate extra MOV's due
to vectorizing things. For example, in
shaders/non-free/steam/anomaly-2/158.shader_test, this:

add(8)          g116<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF { align16 NoDDClr 1Q };
add(8)          g117<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF { align16 NoDDClr 1Q };
add(8)          g116<1>.zwF     g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF { align16 NoDDChk 1Q };
add(8)          g117<1>.zwF     g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF { align16 NoDDChk 1Q };

Turns into this:

add(8)          g13<1>F         g12<4,4,1>.xyxyF g1.4<0,4,1>F   { align16 1Q };
add(8)          g14<1>F         g12<4,4,1>.xyxyF -g1.4<0,4,1>F  { align16 1Q };
mov(8)          g116<1>.xyD     g13<4,4,1>.xyyyD                { align16 NoDDClr 1Q };
mov(8)          g117<1>.xyD     g13<4,4,1>.zwwwD                { align16 NoDDClr 1Q };
mov(8)          g116<1>.zwD     g14<4,4,1>.xxxyD                { align16 NoDDChk 1Q };
mov(8)          g117<1>.zwD     g14<4,4,1>.zzzwD                { align16 NoDDChk 1Q };

So we eliminated two add's, but then had to introduce four mov's to
transpose the result.  Some of the hurt is because vectorization is a bit
over-aggressive and we vectorize something when we should have left it
as a scalar and CSEd it.  Unfortunately, this is all really tricky to do
as it involves the interactions between many different components.
24 jobs for review/nir-vectorize in 44 minutes and 8 seconds (queued for 12 seconds)
latest
Status Job ID Name Coverage
  Build+Test
passed #118218
build:make-gallium-drivers-other

00:09:36

passed #118217
build:make-gallium-drivers-radeonsi

00:06:18

passed #118216
build:make-gallium-drivers-swr

00:12:39

passed #118219
build:make-gallium-st-clover-llvm-39

00:06:46

passed #118220
build:make-gallium-st-clover-llvm-4

00:08:07

passed #118221
build:make-gallium-st-clover-llvm-5

00:06:32

passed #118222
build:make-gallium-st-clover-llvm-6

00:06:35

passed #118223
build:make-gallium-st-clover-llvm-7

00:08:38

passed #118224
build:make-gallium-st-other

00:07:12

passed #118215
build:make-loader-classic-dri

00:08:48

passed #118214
build:make-vulkan

00:05:07

passed #118210
build:meson-gallium-clover-llvm5

00:03:43

passed #118211
build:meson-gallium-clover-llvm6

00:03:46

passed #118212
build:meson-gallium-clover-llvm7

00:04:40

passed #118209
build:meson-gallium-drivers-other

00:04:59

passed #118208
build:meson-gallium-radeonsi

00:04:45

passed #118213
build:meson-gallium-st-other

00:03:35

passed #118207
build:meson-gallium-swr

00:07:38

passed #118206
build:meson-glvnd

00:03:48

passed #118205
build:meson-loader-classic-dri

00:04:08

passed #118204
build:meson-vulkan

00:03:26

passed #118226
build:scons-llvm

00:03:46

passed #118225
build:scons-nollvm

00:03:31

passed #118227
build:scons-swr

00:05:40