anv: clamp per vertex input accesses to patchControlPoints

In a tesselation control shader where an input array is accessed using
the index gl_InvocationID, we can end up accessing elements beyond the
number of input vertices specified in the shader key.

This happens because of the lowering in nir_lower_indirect_derefs().
This lowering will affect compact variables which happens in this
case :

  in gl_PerVertex {
      vec4  gl_Position;
      float gl_ClipDistance[1];
  } gl_in[gl_MaxPatchVertices];

The lowered code produced by NIR is somewhat ineffecient (implements a
binary seach) :

  if (gl_InvocationID < 16) {
     if (gl_InvocationID < 8) {
        if (gl_InvocationID < 4) {
          vec4 vals = load_at_offset(0);
          value = bcsel(vals, gl_InvocationID);
        } else {
          vec4 vals = load_at_offset(4);
          value = bcsel(vals, gl_InvocationID - 4);
        }
     } else {
        if (gl_InvocationID < 12) {
          vec4 vals = load_at_offset(8);
          value = bcsel(vals, gl_InvocationID - 8);
        } else {
          vec4 vals = load_at_offset(12);
          value = bcsel(vals, gl_InvocationID - 12);
        }
     }
  } else {
     if (gl_InvocationID < 24) {
        ...
     } else {
        ...
     }
  }

By default the gl_MaxPatchVertices must be set at 32 items and that's
what the lowering code will use to divide the access into chunks of 4.
But when running with 3 input vertices, this means we'll pull one more
item than what was delivered in the shader payload.

This triggers issues further down the register scheduling where the
g5UD (register for the 4th item) is overwritten by a previous SEND,
leading the URB read to use an invalid handle.

Fixes issues with tests like
dEQP-VK.clipping.user_defined.clip_distance.vert_tess.1

v2: Don't replace source register

v3: Implement in NIR

v4: Clamp per vertex array sizes in NIR (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
35 jobs for !9749 with review/tcs-fix in 13 seconds (queued for 2 seconds)
latest detached
Status Job ID Name Coverage
  Sanity
passed #8300067
sanity

00:00:13

 
  Container
manual #8300070
aarch64 manual
arm_build
manual #8300071
manual
arm_test-base
manual #8300072
windows shell 1809 mesa manual
windows_build_vs2019
manual #8300068
manual
x86_build-base
manual #8300069
manual
x86_test-base
 
  Container 2
created #8300080
android_build
created #8300083
arm64_test
created #8300084
armhf_test
created #8300077
i386_build
created #8300073
kernel+rootfs_amd64
created #8300074
aarch64
kernel+rootfs_arm64
created #8300075
aarch64
kernel+rootfs_armhf
created #8300078
ppc64el_build
created #8300079
s390x_build
created #8300076
x86_build
created #8300081
x86_test-gl
created #8300082
x86_test-vk
 
  Meson X86 64
created #8300090
meson-clang
created #8300091
meson-clover
created #8300087
meson-clover-testing
created #8300088
meson-gallium
created #8300089
meson-release
created #8300085
meson-testing
created #8300086
meson-testing-asan
created #8300092
meson-vulkan
 
  Build Misc
created #8300093
meson-android
created #8300095
aarch64
meson-arm64
created #8300096
aarch64
meson-arm64-asan
created #8300097
aarch64
meson-arm64-build-test
created #8300094
aarch64
meson-armhf
created #8300098
meson-i386
created #8300101
meson-mingw32-x86_64
created #8300100
kvm
meson-ppc64el
created #8300099
kvm
meson-s390x