Add a new runner, written in C.
Whereas run.py runs piglit's shader_runner binary to compile each shader individually and parses the output of INTEL_DEBUG=fs,vs,gs to find the number of instructions and loops, this runner compiles all of the shaders from a single process and uses output from GL_KHR_debug to get the information we want. It uses EGL and GBM (and render nodes) to create a GL display and uses libepoxy for GL function pointer management. It creates one thread per-CPU using OpenMP, each of which compiles shaders in parallel. It creates two OpenGL contexts, one core context and one compatibility context and switches between them as needed. run.py is able to compile all of the GLSL shaders in shader-db (including the closed portion) in about 300 seconds on my quad-core Haswell. This program can do the same in 90 seconds. Profiling shows that it's largely limited by malloc performance, and preloading jemalloc (LD_PRELOAD=/usr/lib64/libjemalloc.so.1) reduces the execution time to about 80 seconds.
Showing with 584 additions and 0 deletions