Skip to content
  • Faith Ekstrand's avatar
    anv: Use on-the-fly surface states for dynamic buffer descriptors · dd4db846
    Faith Ekstrand authored
    
    
    We have a performance problem with dynamic buffer descriptors.  Because
    we are currently implementing them by pushing an offset into the shader
    and adding that offset onto the already existing offset for the UBO/SSBO
    operation, all UBO/SSBO operations on dynamic descriptors are indirect.
    The back-end compiler implements indirect pull constant loads using what
    basically amounts to a texelFetch instruction.  For pull constant loads
    with constant offsets, however, we use an oword block read message which
    goes through the constant cache and reads a whole cache line at a time.
    Because of these two things, direct pull constant loads are much faster
    than indirect pull constant loads.  Because all loads from dynamically
    bound buffers are indirect, the user takes a substantial performance
    penalty when using this "performance" feature.
    
    There are two potential solutions I have seen for this problem.  The
    alternate solution is to continue pushing offsets into the shader but
    wire things up in the back-end compiler so that we use the oword block
    read messages anyway.  The only reason we can do this because we know a
    priori that the dynamic offsets are uniform and 16-byte aligned.
    Unfortunately, thanks to the 16-byte alignment requirement of the oword
    messages, we can't do some general "if the indirect offset is uniform,
    use an oword message" sort of thing.
    
    This solution, however, is recommended for a few of reasons:
    
     1. Surface states are relatively cheap.  We've been using on-the-fly
        surface state setup for some time in GL and it works well.  Also,
        dynamic offsets with on-the-fly surface state should still be
        cheaper than allocating new descriptor sets every time you want to
        change a buffer offset which is really the only requirement of the
        dynamic offsets feature.
    
     2. This requires substantially less compiler plumbing.  Not only can we
        delete the entire apply_dynamic_offsets pass but we can also avoid
        having to add architecture for passing dynamic offsets to the back-
        end compiler in such a way that it can continue using oword messages.
    
     3. We get robust buffer access range-checking for free.  Because the
        offset and range are baked into the surface state, we no longer need
        to pass ranges around and do bounds-checking in the shader.
    
     4. Once we finally get UBO pushing implemented, it will be much easier
        to handle pushing chunks of dynamic descriptors if the compiler
        remains blissfully unaware of dynamic descriptors.
    
    This commit improves performance of The Talos Principle on ULTRA
    settings by around 50% and brings it nicely into line with OpenGL
    performance.
    
    Reviewed-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
    dd4db846