Skip to content
  • Kenneth Graunke's avatar
    i965: Push most TES inputs in SIMD8 mode. · 4a1c8a30
    Kenneth Graunke authored
    
    
    Using the push model for inputs is much more efficient than pulling
    inputs - the hardware can simply copy a large chunk into URB registers
    at thread creation time, rather than having the thread send messages to
    request data from the L3 cache.  Unfortunately, it's possible to have
    more TES inputs than fit in registers, so we have to fall back to the
    pull model in some cases.
    
    However, it turns out that most tessellation evaluation shaders are
    fairly simple, and don't use many inputs.  An arbitrary cut-off of
    32 vec4 slots (16 registers) is more than sufficient to ensure that
    100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
    GPUTest/TessMark, and SynMark.
    
    Note that unlike most SIMD8 stages, this actually reads packed vec4
    data, since that is what our vec4 TCS programs write.
    
    Improves performance in GPUTest's tessmark_x64 microbenchmark
    by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.
    
    Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
    by 22.74% +/- 0.309394% (n = 5).
    
    Improves performance in Shadow of Mordor at low settings with
    tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).
    
    shader-db statistics for files containing tessellation shaders:
    
    total instructions in shared programs: 184358 -> 181181 (-1.72%)
    instructions in affected programs: 27971 -> 24794 (-11.36%)
    helped: 226
    
    Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: default avatarMatt Turner <mattst88@gmail.com>
    4a1c8a30