Implement write-lock-read
The idea behind write-lock-read (WLR) is that it's a generalization of SSA which allows for multiple and partial writes to values while still maintaining most of the nice properties of SSA when it comes to register allocation, CSE, and copy-prop. A value X is a valid WLR value if the following hold:
- All writes to X occur in the same block
- All reads from X either belong to an instruction which writes X as its only output or are dominated by the final write to X.
In other words things like x |= y
are ok but you can't do arbitrary reads and write. When a WLR value is optimized, the sequence of instructions generating that value are effectively considered to be one meta-instruction. No optimization is possible within the WLR instruction sequence so code which generates WLR values is expected to generate an optimal sequence.
When you considered all of the writes to a WLR value as a single meta-instructions, WLR values can be treated as SSA values during optimization. In particular, they have the property that they have exactly one (multi-instruction) definition which dominates all the uses.
There are many places where we need something like this:
- Building payloads and headers involves multiple MOV instructions which generate a single value. This can also be handled with intrinsics and an allocator that can coalesce on-the-fly.
- Predication in the IR requires either something like this or psi-SSA
- Generating
gl_SubgroupInvocation
requires writing0x76543210:v
into a HW_GRF and then emitting 0-2 (depending on SIMD width) adds to generate the other 8 or 24 channels of indices. - Subgroup ops like SPIR-V's OpGroupNonUniformFAdd require piles of scratching around on a HW_GRF and we want to be able to copy-prop the result
We had some disussions in-person and came to the conclusion that the first couple of cases were ones in which we could probably just use regular SSA. For payloads and headers, we probably want to because classic SSA and doing register coalesce in RA will likely yield the best results when it comes to coalescing and range splitting. However, the other cases are a bit harder to handle without WLR.