Skip to content

aco: fix scratch loads which cross element_size boundaries

Daniel Schürmann requested to merge daniel-schuermann/mesa:aco_broadcast into master

Previously, we've set element_size == 16 which causes loads from packed vec3 arrays to cross the boundary and return wrong data. This patch sets element_size = 4 and splits loads into single channel. Fixes all of dEQP-VK.subgroups.ballot_broadcast.*

Cc: 20.1 mesa-stable@lists.freedesktop.org

Small negative effect on pipeline stats with Polaris from a few Dark Souls 3 shaders:

Totals from 40 (0.03% of 134368) affected shaders:
VGPRs: 3292 -> 2884 (-12.39%)
CodeSize: 151704 -> 173188 (+14.16%)
MaxWaves: 118 -> 127 (+7.63%)
Instrs: 26552 -> 28711 (+8.13%)
Cycles: 106208 -> 114844 (+8.13%)
VMEM: 6303 -> 15003 (+138.03%)
SMEM: 6602 -> 6566 (-0.55%); split: +0.58%, -1.12%
VClause: 1161 -> 1653 (+42.38%)
Copies: 1988 -> 508 (-74.45%)
PreVGPRs: 2386 -> 2346 (-1.68%)

On the other hand, single channel scratch loads should be slightly faster. No stats changes for GFX9+.

Merge request reports