Skip to content
  • Kenneth Graunke's avatar
    i965: Switch to scalar TCS by default. · b593737e
    Kenneth Graunke authored
    
    
    Normally, we expect SIMD8 shaders to be more instructions than SIMD4x2
    shaders, as it takes four instructions to operate on a vec4, rather than
    a single instruction.  However, the benefit is that it can process 8
    objects per shader thread instead of 2.
    
    Surprisingly, the shader-db statistics show an improvement in both
    instruction and cycle counts:
    
    Synmark: -31.25% instructions, -29.27% cycles, 0 hurt.
    Tessmark: -36.92% instructions, -37.81% cycles, 0 hurt.
    Unigine Heaven: -3.42% instructions, -17.95% cycles, 0 hurt.
    Shadow of Mordor:
       +13.24% instructions (26 with fewer instructions, 45 with more),
       -5.23% cycles (44 with fewer cycles, 27 with more cycles).
    
    Presumably, this is because the SIMD8 URB messages are a much more
    natural fit than the SIMD4x2 URB messages - there's a ton less header
    setup.
    
    I benchmarked Shadow of Mordor and Unigine Heaven on my Skylake GT3e,
    and the performance seems to be the same or increase ever so slightly
    (< 1 FPS difference).  So I believe it's strictly superior.
    
    There's also a lot more optimization potential we can do in scalar mode.
    
    This will also help us finish fp64 support, as scalar support is going
    to land much sooner than vec4-mode support.
    
    Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: default avatarMatt Turner <mattst88@gmail.com>
    b593737e