Cunning plan for lowered I/O
I'd like to start moving all of NIR towards nir_io_semantics
and away from the legacy driver_location
and nir_intrinsic_base()
. This is a bit of a big project and is going to need buy-in from various driver maintainers. A bunch of drivers are already ready but there's still a handful that will need non-trivial work.
Motivation
Right now, we have basically 4 kinds of I/O in NIR and it's at least one too many:
- Variables: Everything is done with variables and load/store_deref
-
Semantic I/O: This is
load/store_input/output
wherenir_io_semantics
is used and everything is in units of API locations. -
Generic lowered I/O: The driver decides on locations and
type_size()
but it still usesload/store_input/output
. - Driver I/O: Lowered I/O using driver-specific intrinsics
@mareko has been doing a bunch of I/O reworks lately and improvements to the linking helpers. All of that assumes semantic I/O. Meanwhile, many drivers aren't using nir_io_semantics
and are instead doing something weird and custom. The real irony is that what they're doing is usually pretty equivalent to nir_io_semantics
, just different in some random detail. Because semantic I/O and generic lowered I/O use the same intrinsics, it's a constant guessing game as to which one is in play in any give NIR pass.
Meanwhile, Vulkan has been moving more and more in the direction of explicit I/O at the API level. There has even been chatter about trying to give cross-stage vertex I/O that same treatment. There's no extension for this and I wouldn't be able to talk about it here if there were but my reading of the tea leaves says it might be coming. In light of that, I'd like NIR to be ready. I'd also like to have a good feeling of what "the NIR plan" is so that I can talk to it if and when any discussions of that nature do come up in earnest.
Thirdly, and most importantly, this is an area in which Mesa has needlessly diverged. There is a whole lot of "sounded like a good idea at the time" so I'm not going to point fingers. I resisted nir_io_semantics
for a long time so I'm definitely to blame for some of it. However, at this point it's pretty clear that we have two or three paths and they all work but we really don't need them all. The more passes that get added for optimizing and otherwise dealing with I/O, the more painful things become.
driver_location
?
Why not I think that's better answered by trying to first answer the question, "Why driver_location
"? When @cwabbott0 and I first brought up NIR, the idea was that hardware has some sort of I/O space and that driver_location
would be a location in that hardware I/O space. Drivers get variables with locations and would assign driver_location
and then let nir_lower_io()
give them these load/store_input/output
intrinsics which are "nicer" than variables. It wasn't a fundamentally terrible plan.
Then someone made gallium call nir_lower_io()
and that plan got totally shot to hell.
Not that I'm actually complaining. I was pretty annoyed by it for a while but I've since come to the conclusion that nir_io_semantics
really is the only sane way to do any of this. That or variables and variables suck.
So why not driver_location
? Didn't I say it was an okay plan? Well, yes but also no. The problem isn't really with driver_location
but with load/store_input/output
themselves. When I originally brought up NAK, I went all in on driver_location
. NVIDIA hardware is basically the perfect hardware for the driver_location
model. It has a unified I/O space where everything except a handful of system values lives. Everything is addressed in bytes. There are no special instructions for misc. values. It's perfect. The problem is that load/store_input/output
are just not what the back-end wants to consume. They're way too clunky. I ended up adding ald/ast/ipa_nv
intrinsics which map better to what the hardware wants and a custom NIR lowering pass to lower load/store_input/output
to those. At that point, whether I do the mapping to HW locations via driver_location
or directly in my lowering pass doesn't really matter. driver_location
gains me nothing.
So I converted NAK to nir_io_semantics
last week.
The other place the driver_location
plan fell apart is that we originally intended it to be used for all sorts of things. Vertex I/O, uniforms, shared, and anything else where the driver needs a location. The majority of those have been moved over to nir_lower_explicit_io
at this point. The only things left actually using nir_lower_io
are GL uniforms (which are immediately moved to cbuf0 with gallium), vertex I/O, and maybe shared in a few GL edge cases. This great generic system we created is neither particularly generic nor is it all that great.
The plan
So what I'd like to do is to unify all of NIR on nir_io_semantics
for vertex I/O and figure out something for uniforms. I'm honestly not sure how uniforms are still a thing in the gallium world but I see nir_lower_io
paired with nir_var_uniform
in the code and haven't done a detailed enough analysis to figure out why.
To that end, I think we need to do roughly the following:
-
Add a nir_lower_io_semantics()
helper (please, someone help me come up with a better name) which setsdriver_location = -1
on everything and always runs onnir_var_shader_in | nir_var_shader_out
with a fixedvec4
type_size
callback. -
One by one convert drivers to nir_lower_io_semantics()
. Becausedriver_location = -1
, they'll have to usenir_io_semantics
and ignorenir_intrinsic_base()
on allload/store_input/output
intrinsics. I'm fairly sure ACO, AMD LLVM, NAK, panfrost, asahi, and a few others are probably already good to go and should be trivial. -
Something, something nir_var_uniform
. I think 90% of this can be done by overloadinglocation
and makingnir_lower_uniforms_to_ubo()
. If we do still wantnir_lower_io
for uniforms for some drivers, we can add anir_lower_uniform_io()
helper which restricts things down to just what we want for those cases. -
Delete the BASE
,RANGE
, andCOMPONENT
const indices fromload/store_input/output
I 100% recognize that the above plan is probably incomplete. As we go, we can add to it as we figure out what needs to be done in more detail.