NIR: Clean up and unify the bit-casting/packing mess
In NIR, we currently have a bunch of different pack/unpack/cast/whatever operations. It's quite inconsistent which ones exist where and it's not clear that what we have is actually what we want long-term. There may be some opportunity here to clean things up a bit. There are about four categories:
Integer up/down casts:
For down-casts, theses just take the bottom N bits. For up-casts, they fill the bottom bits with the source and the top bits with either 0 (for unsigned up-casts) or the replicated top bit of the source (for signed up-casts).
While they do have arithmetic meaning, down-casts are redundant with certain unpack operations and unsigned up-casts can be considered a special-case pack operation. It's at least worth considering them while we have this discussion. FWIW, in IBC, I implemented down-casts as unpack because it lets us drop a back-end instruction or two in a bunch of cases.
Vector pack/unpack operations
These map roughly to the classic GLSL opcodes and the SPIR-V OpBitcast. They view the source and destination as tightly packed vectors in memory and, effectively, re-interpret that memory as the new size. A uvec2
, for instance, can be re-interpreted as a uint64_t
. Some of them also do a type conversion before the pack or after the unpack such as pack_unorm_2x16
.
These have a few advantages over the split versions (below):
- They map nicer to SPIR-V and GLSL opcodes
- For back-ends that want to implement 16-bit as u16vec2 packed into a
u32vec1
, it's a bit more natural. - For vec4 back-ends, this can lead to lower register pressure depending on the details. (But register pressure tends to be lower in
vec4
mode for us so meh.)
Split pack/unpack operations
These are more "logical" in a sense because they don't re-interpret so much as act like a vector channel extract or vecN
constructor but on the byte axis. The split unpack ops extract a subset of the bits from the source and return it. The split pack ops take a number of different sources with the same bit-size and construct a value which has a higher bit-size and is the sources concatenated.
As with the vector versions, some of these also have type conversions baked in.
These have a few advantages over the vector versions:
- They don't take/produce vectors so it can be a bit nicer on the register allocator
- They naturally vectorize (they just do the same pack/unpack per-channel) so they're easier to emit from lowering code.
One more note about these: Some of them are redundant with the normal integer conversion ops.
Extract operations
These were added as an optimization for the Intel back-end. They effectively do a split unpack followed by an upcast back to the original bit-size. There are a number of cases where this comes up. For instance, format conversion for image_load_store, we end up doing a uint32_t
to vec4
unorm conversion and the first thing involved there is to break the uint32_t
into bytes and convert to integers so they can be multiplied by the float.
In some sense, these are entirely redundant with the split unpack ops and a data conversion. However, they are a bit easier for the current Intel back-end to consume which is why we have them. With IBC, I think they're unnecessary.
Bitcast operations
We don't actually have these. However, the idea would be to have ops that can take, for instance, a u32vec2
and produce a u16vec4
or a u8vec8
. This would map very precisely to the SPIR-V OpBitcast semantics. Unfortunately, there are a combinatorial explosion of possible opcodes here. I think it'd end up being on the order of 40 of them or so.
Bitfield insert/extract operations
The GLSL bitfield operations can be thought of as a generalization of pack/unpack. They're often not what we want in back-ends when something better is available but we probably want to consider them as part if the broader issue.