Skip to content

Write your NIR passes in OpenCL! GenXML inside shaders! Extravaganza

Alyssa Rosenzweig requested to merge alyssa/mesa:asahi/clc into main

Add infrastructure to write functions in CL and have them automatically wrapped as nir_builder routines. This allows writing complex logic for use in lowering passes. It also lets you write meta shaders in CL, by expressing the core in CL and wrapping in an nir_builder_init_simple_shader -- this separation means you can pass your inputs and shader key however you want (UBO? push constants? will this run as a compute dispatch? or maybe a vertex or frag shader? etc).

With the infrastructure, some asahi lowerings are ported from nir to cl as a proof of concept. The texture lowerings here use GenXML on the CL side to automatically unpack descriptors (even with modifiers), which takes nightmare-level stuff like the image atomic lowering and makes it a short readable high-level function. Packing should work too for dgc down the road.

There are also misc patches to optimize the resulting code to get close to what we'd write by hand.

Asahi will soon use this to implement a bunch of advanced features, this is the plumbing needed to do so sanely.

Should be fun 🎉


For the benefits of the approach: one motivating example is !25498 (ac80e4f2) ... Ripping out 100 lines of brittle texture lowering code and replacing with completely straightforward C code using a GenXML unpack. No more hardcoded offsets everywhere, etc.

But the real benefit is making things /possible/. Above a certain level of complexity, nir_builder collapses under its own weight. By using either CL C or GLSL, we can do things that we couldn't attempt otherwise. This is why these languages are already used for BVH building in anv and radv respectively. The benefit of CL C over GLSL is that it's a real C environment (mostly) and can ingest existing code with minimal changes. That means we can have common headers for the CPU and GPU. That means GenXML on the GPU. That means a lot of projected time saved porting existing C and C++ code to the GPU (motivating example: compute-based tessellator kernels. The reference implementation is about 2000 lines of C++. No way in hell I'm doing that in nir_builder, and it'd be a lot cheaper to do a straight port from C++ to CL C then to rewrite in GLSL. And GLSL has a LOT of gotchas that CL fixes, like unsigned_thing + 1 being a syntax error and needed unsigned_thing + 1u instead.)

I haven't done any real benchmarking because tbh whatever the cost is, I'm eating it for asahi for the sake of tessellation. I don't expect anything super scary, though, and again intel-clc does most of this already. There's likely a small buildtime hit, but at runtime things are similar to the handwritten nir_builder path, with extra constant folding and inlining required.

Edited by Alyssa Rosenzweig

Merge request reports