WIP: Optimize setting unifroms.
I started working on this branch five years ago.
I tried the changes on Core systems, and I wasn't able to measure a significant change. It seemed to help, but all of the changes were in the range of test deviation noise.
So, I abandoned it for a long, long time. Now we have this new, shiny Iris driver that uses a lot less CPU, so there's some chance this might actually help. I've really just dusted off the original series by rebasing on over 17,000 new commits. I also fixed a couple bugs that caused some piglit / CI failures. I have not done any new performance testing.
There are three things that happen in this series:
-
Specialize some of the commonly used
glUniform
variants. Previously there were a small number of core functions that handled everything. This resulted in a log of redundant or non-sense checks. For example, theglUniform4iv
path had checks to handle setting sampler uniforms, but that is not possible on that path. Specializing the functions grows the code, but deletes things from the execution paths. -
Implement a uniform cache. A shocking number of applications will repeatedly set uniforms to the values they already have. Sometimes this happens many times per frame. Each time that happens, a lot of unnecessary flushing, etc. occurs. The uniform cache just checks that new value against the old value and elides the work of changing the uniform when the values are the same.
-
Add SSE2 optimizations. The SSE2 optimizations exist primarily to help the uniform cache checks. More could possibly be done here, but I don't know that it's worth the effort. I thought I had some changes that used later versions of SSE, but I cannot find that code now.
There is still one piglit failure on i965 and a small handful on Iris.