Commit a737470a authored by Daniel Schürmann's avatar Daniel Schürmann
Browse files

ac: set .align_mul = 16u/8u for load/store_shared

As the shared variables are now written/loaded per slot,
the align_mul is 16 instead of 4.
For store_shared, because of the additional copies when
using .align_mul = 16u, it's better to keep it at 8u.

Totals from 135 (0.09% of 149839) affected shaders: (GFX10.3)
VGPRs: 6504 -> 6776 (+4.18%); split: -0.12%, +4.31%
CodeSize: 505684 -> 479276 (-5.22%); split: -5.36%, +0.13%
MaxWaves: 2926 -> 2854 (-2.46%); split: +0.07%, -2.53%
Instrs: 89882 -> 87780 (-2.34%); split: -3.41%, +1.08%
Latency: 321525 -> 313024 (-2.64%); split: -3.13%, +0.48%
InvThroughput: 96611 -> 96225 (-0.40%); split: -2.87%, +2.47%
Copies: 7501 -> 10076 (+34.33%)
PreVGPRs: 5113 -> 5171 (+1.13%); split: -0.04%, +1.17%
parent 2f5b37e5
Pipeline #340722 waiting for manual action with stages