nir: Merge consecutive store_scratch

I have observed some shaders in shader-db that have code like:

void main ()
{
  movieTaps[0] = vec3(0.2165, 0.125, 1.0);
  // repeat with different constant vec3 for [1..58].
  movieTaps[59] = vec3(0.8668, 0.2513, 0.907);

And NIR merrily generates the obvious, horrifying code. The worst part is that since these are vec3, each component is written individually.

        vec1 32 ssa_0 = load_const (0x00000000 = 0.000000)
        vec1 32 ssa_1 = load_const (0x3f800000 = 1.000000)
        ...
        vec1 32 ssa_10 = load_const (0x3e5db22d = 0.216500)
        intrinsic store_scratch (ssa_10, ssa_0) (align_mul=256, align_offset=0, wrmask=x /*1*/)
        vec1 32 ssa_11 = load_const (0x3e000000 = 0.125000)
        vec1 32 ssa_12 = load_const (0x00000004 = 0.000000)
        intrinsic store_scratch (ssa_11, ssa_12) (align_mul=256, align_offset=0, wrmask=x /*1*/)
        vec1 32 ssa_13 = load_const (0x00000008 = 0.000000)
        intrinsic store_scratch (ssa_1, ssa_13) (align_mul=256, align_offset=0, wrmask=x /*1*/)

The Intel compiler generates the obvious, horrifying code. As a side note, I noticed this because those writes get misidentified as 180 spills.

A pass already exists that merges UBO and SSBO reads and writes on consecutive locations. This pass should be extended to operate on scratch reads and writes as well.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information