aco: Add a simple heuristic to decide early or late primitive export.

Late export is theoretically better if used with LATE_ALLOC,
but in practice, the early export has an advantage of
lower register usage, therefore more concurrent waves.

The idea of this commit is that "small" shaders benefit from early
primitive export more, due to being able to launch much more waves.

Let's consider a NIR shader "small" when it has only 1 block.
This yields both better performance, and better stats, than always
using late export.

Fossil DB on Sienna:

Totals from 12807 (8.76% of 146265) affected shaders:
VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83%
SpillSGPRs: 1458 -> 1538 (+5.49%)
CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14%
MaxWaves: 282902 -> 278516 (-1.55%)
Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18%
VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30%
SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16%
Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26%
Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26%
PreSGPRs: 434701 -> 447396 (+2.92%)
PreVGPRs: 527783 -> 540527 (+2.41%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <!10106>
36 jobs for !10106 with aco-ngg-tuning in 9 minutes and 58 seconds (queued for 3 seconds)
latest detached
Status Job ID Name Coverage
  Sanity
passed sanity #8872235

00:00:08

 
  Container
passed arm_build #8872240
aarch64

00:00:24

passed windows_build_vs2019 #8872242
windows shell 1809 mesa

00:00:08

passed x86_build-base #8872237

00:00:19

passed x86_test-base #8872238

00:00:17

 
  Container 2
passed android_build #8872257

00:00:17

passed i386_build #8872252

00:00:19

passed kernel+rootfs_amd64 #8872245

00:00:10

passed kernel+rootfs_arm64 #8872246
aarch64

00:00:11

passed kernel+rootfs_arm64-baremetal #8872262
aarch64

00:00:12

passed kernel+rootfs_armhf #8872248
aarch64

00:00:11

passed kernel+rootfs_armhf-baremetal #8872264
aarch64

00:01:13

passed ppc64el_build #8872254

00:00:18

passed s390x_build #8872255

00:00:17

passed x86_build #8872250

00:00:17

passed x86_test-gl #8872259

00:00:15

passed x86_test-vk #8872260

00:00:15

 
  Meson X86 64
passed meson-clang #8872275

00:08:24

passed meson-clover #8872277

00:03:55

passed meson-clover-testing #8872270

00:02:04

passed meson-gallium #8872271

00:04:59

passed meson-release #8872273

00:03:07

passed meson-testing #8872266

00:02:14

passed meson-testing-asan #8872268

00:03:42

passed meson-vulkan #8872278

00:02:09

 
  Build Misc
passed arm_test #8872280

00:00:18

passed meson-android #8872282

00:01:00

passed meson-arm64 #8872285
aarch64

00:01:24

passed meson-arm64-asan #8872288
aarch64

00:02:41

passed meson-arm64-build-test #8872290
aarch64

00:01:55

passed meson-armhf #8872283
aarch64

00:01:34

passed meson-i386 #8872292

00:02:09

passed meson-mingw32-x86_64 #8872297

00:01:57

passed meson-ppc64el #8872295

00:01:41

passed meson-s390x #8872293
kvm

00:03:09

 
  Amd
passed radv-fossils #8872299

00:06:55