mesa + glthread: many optimizations (!4314) · Merge requests · Mesa / mesa

A reviewed subset merged here: !4466 (merged), !4758 (merged)

This is a big change for both Mesa and glthread performance.

Mesa:

Removal of NullBufferObj and _mesa_is_bufferobj (for 5% perf improvement in "torcs")
Faster VAO initialization
Faster glPush/PopClientAttrib
State change improvements
Dynamic VAOs skip most of validation and don't compute interleaved arrays (st/mesa has a separate codepath for this)
vbo_context is inlined in gl_context for faster glBegin/End
New feature: Ability to create struct gl_buffer_object and map it from any thread (for glthread)
New glInternal* functions that take struct gl_buffer_object * to execute a buffer copy and set vertex and index buffers (for glthread)

glthread:

glBufferSubData is asynchronous for any size (implemented as a buffer upload in the main thread + glInternalBufferSubDataCopy in the driver thread)
non-VBO vertices and indices are uploaded for all non-Indirect non-IBM draws

glthread now performs well with apps using non-VBO data, and scales better with apps using glBufferSubData too much.

Mesa driver overhead is still important if the Mesa thread is the busiest. Enable the gallium thread for better driver overhead distribution.

Performance improvement of this MR in the game "torcs":

glthread requires these CAPs for user data uploads:

Edited Apr 27, 2020 by Marek Olšák

mesa + glthread: many optimizations