radv: force unroll loops which access temporary arrays via induction (!1789) · Merge requests · Mesa / mesa

Rhys Perry requested to merge pendingchaos/mesa:radv_force_unroll into master Aug 28, 2019

This has a very large impact on at least one X4 Foundations shader:

SGPRS: 40 -> 88 (120.00 %)
VGPRS: 168 -> 32 (-80.95 %)
Scratch size: 1012 -> 0 (-100.00 %) dwords per thread
Code Size: 15360 -> 4864 (-68.33 %) bytes
Max Waves: 1 -> 8 (700.00 %)

and has been reported to fix a large performance regression: https://github.com/daniel-schuermann/mesa/issues/120#issuecomment-523166738

Looking at force_unroll_array_accesses, this was already done but only if max_trip_count matched the array's size. This would prevent unrolling when the loop is split into multiple loops with each loop only processing a subset of the array (probably to try to convince the compiler to unroll the loop). Even if there was only a single loop, it wouldn't be unrolled because the (large) array was initialized in it's body.

pipeline-db changes:

Totals from affected shaders:
SGPRS: 136 -> 96 (-29.41 %)
VGPRS: 88 -> 168 (90.91 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 7100 -> 7968 (12.23 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 11 -> 6 (-45.45 %)
Wait states: 0 -> 0 (0.00 %)

This causes LLVM to use far more vgprs in a couple of shaders, halving their max_waves and reducing performance slightly in F1 2017 and Dota 2.

EDIT: I don't think I made this clear, the array was large enough to be lowered to scratch. Since glslang creates SPIR-V that initializes the array each loop iteration, there is a large amount of scratch stores each loop iteration.

Edited Aug 28, 2019 by Rhys Perry

radv: force unroll loops which access temporary arrays via induction

Merge request reports