Skip to content
  • Daniel Schürmann's avatar
    ac: set .align_mul = 16u/8u for load/store_shared · a737470a
    Daniel Schürmann authored
    As the shared variables are now written/loaded per slot,
    the align_mul is 16 instead of 4.
    For store_shared, because of the additional copies when
    using .align_mul = 16u, it's better to keep it at 8u.
    
    Totals from 135 (0.09% of 149839) affected shaders: (GFX10.3)
    VGPRs: 6504 -> 6776 (+4.18%); split: -0.12%, +4.31%
    CodeSize: 505684 -> 479276 (-5.22%); split: -5.36%, +0.13%
    MaxWaves: 2926 -> 2854 (-2.46%); split: +0.07%, -2.53%
    Instrs: 89882 -> 87780 (-2.34%); split: -3.41%, +1.08%
    Latency: 321525 -> 313024 (-2.64%); split: -3.13%, +0.48%
    InvThroughput: 96611 -> 96225 (-0.40%); split: -2.87%, +2.47%
    Copies: 7501 -> 10076 (+34.33%)
    PreVGPRs: 5113 -> 5171 (+1.13%); split: -0.04%, +1.17%
    a737470a