DrawElementsBaseVertex is very slow when no VBO splitting occurs
Submitted by Ruslan Kabatsayev
Assigned to Ian Romanick
Created attachment 125871 Test case
In the attached test case, when baseVertex passed to glDrawElementsBaseVertex() is <2998, so that _tnl_draw_prims() doesn't get into the branch calling vbo_split_prims(), I get serious performance degradation compared to the case when vbo_split_prims() is called. Here's typical output of the test on "Software Rasterizer" (non-Gallium swrast):
Frame 4: time for DrawElements ( ): 123 ms Frame 5: time for DrawElementsBaseVertex(offset=0 ): 125 ms Frame 6: time for DrawElementsBaseVertex(offset=2997): 4937 ms Frame 7: time for DrawElementsBaseVertex(offset=2998): 147 ms
Similar results are on Intel Atom N550 with non-Gallium i915 driver (with ITERATIONS set to 800 in the test): about 50x performance difference between offset==2997 and offset==2998.
Lowering MAX_ARRAY_LOCK_SIZE in src/mesa/main/config.h from current 3000 to 300 avoids such performance drop at this vertexOffset boundary of 2997-2998, but still gives similar problem with at the boundary of 297-298.
Attachment 125871, "Test case":