Write a NIR ALU opcode tester
We've had a number of cases fly by lately where people have asked how NIR's constant-folding compares to hardware. Unfortunately, this has historically been a bit annoying to determine. It usually involves very careful reading of both the NIR constant-folding code and hardware docs and hoping that the docs and your understanding of both the docs and the C standard are correct. It doesn't have to be that hard...
The idea here is to write a little app which is able to execute shaders through GL or Vulkan to try a bunch of cases and test hardware behavior against NIR's constant folding code. We would implement and use a Mesa compiler back-door to ensure that, at least at the front-end, we're getting exactly the NIR opcode we think we are. We would then fire off a compute shader which tries a combinatorial explosion of inputs and compare the results with the constant folding code. For 8-bit ops (and maybe 16-bit), this could be exhaustive. For 32-bit ops, we would have to come up with a smattering of semi-random data that fuzzes it well enough and hits all the usual edge cases: positive and negative powers of two, positive and negative powers of two minus one, zero, maybe some interesting mask-type values, and anything else we thing might hit edge cases.
Compiler back-door
For SPIR-V (and maybe GLSL; that's harder) we could add support for intrinsic functions of the form __mesa_nir_alu_op_foo
. In our handling of OpCall, we'll match against the NIR opcode table (we have string names for all of them) and, instead of emitting a shader call instruction, emit the exact NIR opcode. We can use the nir_op_info
for the op to check things like input bit sizes, number of components, etc. to make sure it's following the rules. That'll help avoid NIR validation errors.
For GLSL, we could do something similar. However, that compiler requires a bit more plumbing in the way it handles intrinsic functions so I don't know that dynamically handling them on-the-fly will work especially well. It might be worth doing, though, if we care about testing GLES drivers. It's possible we could go through GL_ARB_spirv but if a driver doesn't support really recent desktop GL versions, it probably doesn't support SPIR-V.
Depending on how sketchy the back-door is, we may want to hide it behind an environment variable.
Test app
The test app would be a fairly normal Vulkan or GLES app except that it would also link against enough of NIR to get the constant-folding code. It would code-gen the SPIR-V or GLSL shader using a fairly straightforward template. Generating SPIR-V by hand can be a bit of a pain but maybe it could use the SPIR-V builder from Zink? Or the shaders are simple enough it could probably be done by hand. Or, for that matter, it could link against SPIR-V tools to get the assembler and use that. In any case, the shader would be generated for a particular opcode and bit size (if applicable). It would probably be a compute shader that takes some SSBOs for inputs and outputs.
The app would then set up the various API objects (mostly pipelines and SSBOs) and manage marshaling the data through to the shader. For large combinatorial explosions, it would probably need to be able to batch up its runs so we don't allocate giant SSBOs. Once each batch has completed on the GPU, it would evaluate the same data with the NIR constant folding code and compare the results. For any results that differ, it would dump them out in some nice, readable format.
Uses
I think this could be useful for quite a few things:
- Comparing hardware to constant-folding: Whenever we have one of these constant-folding discussions, it would be way simpler if the person proposing the standard folding could post a branch and everyone could run the app. Saves a lot of costly and unreliable brain work.
- Figuring out exact behavior of hardware: Reverse-engineering and wondering what the exact behavior of that opcode is? It's now really easy to try a giant pile of combinations and find all those nasty corners.
- Finding undefined behavior in constant-folding: If the app were built with clang's asan, it'll catch a lot of C undefined behavior and throw warnings. Build it that way and then run all the opcodes on a reasonably fast and complete GL/Vulkan implementation like Intel or AMD.
-
Finding constant-folding crashes: Are we 100% sure NIR's constant folding never throws a floating-point exception? I'm not.
😁 This would let us relatively easily fuzz for those types of issues and find out what the GPU hardware does with those bogus values at the same time.
Ok, there's the sales pitch. Seems like a decent intern project to me.