NIR: Clean up and unify the bit-casting/packing mess

In NIR, we currently have a bunch of different pack/unpack/cast/whatever operations. It's quite inconsistent which ones exist where and it's not clear that what we have is actually what we want long-term. There may be some opportunity here to clean things up a bit. There are about four categories:

Integer up/down casts:

For down-casts, theses just take the bottom N bits. For up-casts, they fill the bottom bits with the source and the top bits with either 0 (for unsigned up-casts) or the replicated top bit of the source (for signed up-casts).

While they do have arithmetic meaning, down-casts are redundant with certain unpack operations and unsigned up-casts can be considered a special-case pack operation. It's at least worth considering them while we have this discussion. FWIW, in IBC, I implemented down-casts as unpack because it lets us drop a back-end instruction or two in a bunch of cases.

Vector pack/unpack operations

These map roughly to the classic GLSL opcodes and the SPIR-V OpBitcast. They view the source and destination as tightly packed vectors in memory and, effectively, re-interpret that memory as the new size. A uvec2, for instance, can be re-interpreted as a uint64_t. Some of them also do a type conversion before the pack or after the unpack such as pack_unorm_2x16.

These have a few advantages over the split versions (below):

They map nicer to SPIR-V and GLSL opcodes
For back-ends that want to implement 16-bit as u16vec2 packed into a u32vec1, it's a bit more natural.
For vec4 back-ends, this can lead to lower register pressure depending on the details. (But register pressure tends to be lower in vec4 mode for us so meh.)

Split pack/unpack operations

These are more "logical" in a sense because they don't re-interpret so much as act like a vector channel extract or vecN constructor but on the byte axis. The split unpack ops extract a subset of the bits from the source and return it. The split pack ops take a number of different sources with the same bit-size and construct a value which has a higher bit-size and is the sources concatenated.

As with the vector versions, some of these also have type conversions baked in.

These have a few advantages over the vector versions:

They don't take/produce vectors so it can be a bit nicer on the register allocator
They naturally vectorize (they just do the same pack/unpack per-channel) so they're easier to emit from lowering code.

One more note about these: Some of them are redundant with the normal integer conversion ops.

Extract operations

These were added as an optimization for the Intel back-end. They effectively do a split unpack followed by an upcast back to the original bit-size. There are a number of cases where this comes up. For instance, format conversion for image_load_store, we end up doing a uint32_t to vec4 unorm conversion and the first thing involved there is to break the uint32_t into bytes and convert to integers so they can be multiplied by the float.

In some sense, these are entirely redundant with the split unpack ops and a data conversion. However, they are a bit easier for the current Intel back-end to consume which is why we have them. With IBC, I think they're unnecessary.

Bitcast operations

We don't actually have these. However, the idea would be to have ops that can take, for instance, a u32vec2 and produce a u16vec4 or a u8vec8. This would map very precisely to the SPIR-V OpBitcast semantics. Unfortunately, there are a combinatorial explosion of possible opcodes here. I think it'd end up being on the order of 40 of them or so.

Bitfield insert/extract operations

The GLSL bitfield operations can be thought of as a generalization of pack/unpack. They're often not what we want in back-ends when something better is available but we probably want to consider them as part if the broader issue.

Edited Sep 11, 2020 by Faith Ekstrand

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information