mesa + glthread: many optimizations
This is a big change for both Mesa and glthread performance.
- Removal of
_mesa_is_bufferobj(for 5% perf improvement in "torcs")
- Faster VAO initialization
- State change improvements
- Dynamic VAOs skip most of validation and don't compute interleaved arrays (st/mesa has a separate codepath for this)
- vbo_context is inlined in gl_context for faster glBegin/End
- New feature: Ability to create
struct gl_buffer_objectand map it from any thread (for glthread)
glInternal*functions that take
struct gl_buffer_object *to execute a buffer copy and set vertex and index buffers (for glthread)
glBufferSubDatais asynchronous for any size (implemented as a buffer upload in the main thread +
glInternalBufferSubDataCopyin the driver thread)
- non-VBO vertices and indices are uploaded for all non-Indirect non-IBM draws
glthread now performs well with apps using non-VBO data, and scales better with apps using
glBufferSubData too much.
Mesa driver overhead is still important if the Mesa thread is the busiest. Enable the gallium thread for better driver overhead distribution.
Performance improvement of this MR in the game "torcs":
- +16% by default vs master
- +40% after enabling glthread vs master
glthread requires these CAPs for user data uploads: