PBO unpacking is not accelerated
Submitted by Whatcookie
Assigned to mes..@..op.org
While investigating performance bottlenecks with RPCS3 while using Radeonsi, I came across a scene which was only getting 1FPS, while spending 99% of the CPU time in the driver. Further investigation led to the discovery that using the GL_STREAM_COPY flag instead of GL_STATIC_COPY led to performance increasing to 11fps.
This prompted us to look into Mesa's code for an explanation, since the operation here should be moving data between GPU memory to GPU memory, and shouldn't be faster with GL_STREAM_COPY.
We came across this a338dc01 which provided an explanation for why GL_STREAM_COPY was faster.
Anyways, point is we need PBO unpacking acceleration for this to be any faster. Even when using the GL_STREAM_COPY flag about 90% of the time spent in the graphics thread is spent in a single function in the driver.