d3d11: cache compiled shaders in a sake of faster startup
It's not really a problem, but some room for improvement, that seems to be feasible. I think I can do the PR, but also would like to know your opinion, so let's open a discussion.
Idea
The thing is that in the log of some apps I can see that it can compile many times the same d3d11 shaders, and it seems that sometimes the compilation can take up to ~5ms depending on the case.
Following the code and the logs it seems that d3d11 elements just compile shaders every time they start. My idea is to improve gstd3d11compile.cpp in a way that it would be caching the compiled shader, and checking/accessing the already compiled one by a tuple device/code.
If I understand correctly, the things that can determine each shader are it's code, the ID3D11Device handle and the compilation flags, macros, parameters, etc. Looking on compiled shader as just a compiled binary that will run on a certain GPU. Please correct me if I'm wrong.
What would improve
There're many cases when the same shader gets compiled several times for the same device. The simplest example is when the application opens and closes many pipelines, so each time the shaders will be compiled again. So all the sturtups starting from the second one would be ~5ms faster.
But also depending on the pipeline it can be more. If there're also more elements, such as d3d11convert, etc.
How I think it could be done
The crucial change would happen in the gst_d3d11_compile() function in gstd3d11compile.cpp.
There would be a GHashTable stored in a static memory (variable).
Before executing GstD3DCompileFunc, the code of gst_d3d11_compile would build a hash of the parameters of that function (code, flags, includes, everything). The algorhythm is a really small piece of code that can be copied from g_bytes_hash.
So, once we have this hash, we search in our "static" GHashTable, and so either we find the pointer to already compiled shader and return it, either we don't find it and therefore execute GstD3DCompileFunc, store the shader in our hash table and return it.
Everything under a static lock.
Apart from that change the patch would also manage not to release the shader objects (because now we store them in the hash table forever). Such as ID3D11PixelShader *ps[CONVERTER_MAX_QUADS];
of struct _GstD3D11ConverterPrivate
.
CC pinging @seungha.yang as an author of the code.