-
Daniel Schürmann authored
As the shared variables are now written/loaded per slot, the align_mul is 16 instead of 4. For store_shared, because of the additional copies when using .align_mul = 16u, it's better to keep it at 8u. Totals from 135 (0.09% of 149839) affected shaders: (GFX10.3) VGPRs: 6504 -> 6776 (+4.18%); split: -0.12%, +4.31% CodeSize: 505684 -> 479276 (-5.22%); split: -5.36%, +0.13% MaxWaves: 2926 -> 2854 (-2.46%); split: +0.07%, -2.53% Instrs: 89882 -> 87780 (-2.34%); split: -3.41%, +1.08% Latency: 321525 -> 313024 (-2.64%); split: -3.13%, +0.48% InvThroughput: 96611 -> 96225 (-0.40%); split: -2.87%, +2.47% Copies: 7501 -> 10076 (+34.33%) PreVGPRs: 5113 -> 5171 (+1.13%); split: -0.04%, +1.17%
a737470a