mesa + glthread: many optimizations
A reviewed subset merged here: !4466 (merged), !4758 (merged)
This is a big change for both Mesa and glthread performance.
Mesa:
- Removal of
NullBufferObj
and_mesa_is_bufferobj
(for 5% perf improvement in "torcs") - Faster VAO initialization
- Faster
glPush/PopClientAttrib
- State change improvements
- Dynamic VAOs skip most of validation and don't compute interleaved arrays (st/mesa has a separate codepath for this)
- vbo_context is inlined in gl_context for faster glBegin/End
- New feature: Ability to create
struct gl_buffer_object
and map it from any thread (for glthread) - New
glInternal*
functions that takestruct gl_buffer_object *
to execute a buffer copy and set vertex and index buffers (for glthread)
glthread:
-
glBufferSubData
is asynchronous for any size (implemented as a buffer upload in the main thread +glInternalBufferSubDataCopy
in the driver thread) - non-VBO vertices and indices are uploaded for all non-Indirect non-IBM draws
glthread now performs well with apps using non-VBO data, and scales better with apps using glBufferSubData
too much.
Mesa driver overhead is still important if the Mesa thread is the busiest. Enable the gallium thread for better driver overhead distribution.
Performance improvement of this MR in the game "torcs":
- +16% by default vs master
- +40% after enabling glthread vs master
glthread requires these CAPs for user data uploads:
PIPE_CAP_MAP_UNSYNCHRONIZED_THREAD_SAFE
PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION
PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET
Edited by Marek Olšák