WIP: shader profiling in NIR using ARB_shader_clock
Introduction
This patch is intended to be a reimplementation of INTEL_DEBUG=shader_time
but using NIR and the ARB_shader_clock
extension. This would allow us to reuse the profiling code on any mesa driver that supports that extension, and remove intel-specific code from the intel driver.
This way, other mesa backends that meet the requirements (NIR and ARB_shader_clock
) will be able to add a shader profiling feature via the MESA_SHADER_TIME
environment variable, which would be the mesa generic equivalent to the intel specific INTEL_DEBUG=shader_time
.
Results
I ran a simple program for a similar amount of time using both INTEL_DEBUG=shader_time and MESA_SHADER_TIME, but not both at the same execution, because that would mess with the results.
INTEL_DEBUG=shader_time MESA_SHADER_TIME=1
Implementation
This was implemented by adding a NIR pass that injects some profiling code into the shader and stores the result in a SSBO via the ssbo_atomic_add()
intrinsic.
As we want to support multiple shader programs, after the program finishes running we read the result from the SSBO and store it in a buffer that is indexed by the shader program ID. This data can be found inside struct gl_context
.
Every time a draw call happens, a SSBO that has MESA_SHADER_STAGES
uint64_t
variables is bound and after the draw call ends we accumulate the recorded value for each stage in that particular shader program to the buffer inside struct gl_context
. If a new shader program is created, we just reallocate the buffer on the fly to accomodate for it.
Also, it's important that we disable the glsl shader cache to force shader recompilation when MESA_SHADER_TIME
is on.
But for all of this to work, the SSBO must first be declared/created with a known block_index
and binding_point
. The binding point is needed because we must bind the SSBO every time a draw call happens. The block index is needed because the NIR pass must add the ssbo_atomic_add()
instruction, and that instruction requires a block index to operate on.
Creating an SSBO inside the NIR pass doesn't work, as the linking stage won't reserve space for it. Thus, the SSBO with the required properties must be created in some other way.
Currently, this is happening on glsl/linker.cpp
by adding one extra SSBO block at index 0 in the middle of the linker pipeline. The binding point is selected by looping over all other variables and giving the SSBO the highest_binding_point+1
. As the binding point is being queried at runtime by calling get_program_resourceiv()
and the NIR code doesn't explicitly use the binding point for anything, we just need to give it any binding point that isn't being used.
Current status / Missing work
With all that being said, the SSBO creation code is not working as intended, causing test regressions and crashes. This is probably because adding an extra block to ShaderStorageBlocks
at struct gl_program
isn't the proper way to allocate an extra SSBO, causing weird things to happen when we write to it.
Also, the binding point selection code must be improved, because the way it's currently implemented it can generate a SSBO with a binding point that is over the maximum number of possible binding points, but this is not the main cause of the test failures or crashes.
Regressions / Piglit test results
The current implementation has the following piglit test results on the "quick" test suite:
[41452/41452] skip: 4328, pass: 35385, warn: 12, fail: 1720, crash: 7
https://pastebin.com/raw/hKGyky1X