ac/nir: Remove byte permute from prefix sum of the repack sequence.

The byte-permute instruction v_perm_b32 is not exposed by older
LLVM releases (only available on LLVM 13 and later), therefore a new
sequence is needed which we can use with these LLVM versions too.

The prefix sum is replaced by two alternatives:

1. For GPUs that support v_dot, we shift 0x01 to the wanted byte
positions and then use v_dot to sum the results.

2. For older GPUs (Navi 10), we simply shift out the unwanted bytes
and use v_sad_u8 to produce the sum.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <!12786>
46 jobs for !12786 with radv-repack-no-byte-permute in 20 minutes and 8 seconds (queued for 8 seconds)
latest detached
Status Name Job ID Coverage
  Sanity
passed sanity #13812091

00:00:08

 
  Container
passed debian/android_build #13812097

00:00:28

passed debian/arm_build #13812101
aarch64

00:00:29

passed debian/arm_test #13812106

00:00:29

passed debian/i386_build #13812094

00:00:23

passed debian/ppc64el_build #13812095

00:00:25

passed debian/s390x_build #13812096

00:00:25

passed debian/x86_build #13812093

00:00:23

passed debian/x86_build-base #13812092

00:00:28

passed debian/x86_test-base #13812098

00:00:29

passed debian/x86_test-gl #13812099

00:00:27

passed debian/x86_test-vk #13812100

00:00:25

passed fedora/x86_build #13812102

00:00:24

passed kernel+rootfs_amd64 #13812103

00:00:13

passed kernel+rootfs_arm64 #13812104
aarch64

00:00:12

passed kernel+rootfs_armhf #13812105
aarch64

00:00:12

passed windows_build_vs2019 #13812107
windows shell 1809 mesa

00:00:12

 
  Build X86 64
passed debian-clang #13812114

00:09:08

passed debian-clover #13812115

00:01:39

passed debian-clover-testing #13812110

00:00:46

passed debian-gallium #13812111

00:06:08

passed debian-release #13812112

00:02:10

passed debian-testing #13812108

00:01:26

passed debian-testing-asan #13812109

00:03:27

passed debian-vulkan #13812116

00:02:51

passed fedora-release #13812113

00:00:56

 
  Build Misc
passed debian-android #13812117

00:01:27

passed debian-arm64 #13812119
aarch64

00:01:28

passed debian-arm64-asan #13812120
aarch64

00:03:03

passed debian-arm64-build-test #13812121
aarch64

00:02:04

passed debian-armhf #13812118
aarch64

00:02:40

passed debian-i386 #13812123

00:01:35

passed debian-mingw32-x86_64 #13812125

00:02:51

passed debian-ppc64el #13812124

00:02:06

passed windows-vs2019 #13812122
windows docker 1809 mesa

00:05:09

 
  Amd
passed radeonsi-stoney-gles2:amd64 #13812130
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:06:34

passed radeonsi-stoney-gles31:amd64 1/2 #13812133
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:15:04

passed radeonsi-stoney-gles31:amd64 2/2 #13812134
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:11:13

passed radeonsi-stoney-gles3:amd64 1/2 #13812131
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:09:11

passed radeonsi-stoney-gles3:amd64 2/2 #13812132
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:07:55

passed radeonsi-stoney-piglit-gl:amd64 #13812136
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:14:53

passed radeonsi-stoney-traces:amd64 #13812135
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:04:09

passed radv-fossils #13812129

00:17:18

passed radv_stoney_vkcts:amd64 1/3 #13812126
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:15:45

passed radv_stoney_vkcts:amd64 2/3 #13812127
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:15:24

passed radv_stoney_vkcts:amd64 3/3 #13812128
mesa-ci-x86-64-lava-hp-11A-G6-EE-grunt

00:15:23