turnip: Use NIR for clear/blit shaders
This is much more maintainable, extensible, and easy to read than hand-rolled structs approximating assembly. This also removes the last use of the old hand-written packing structs. There are a few minor differences:
- The shaders are larger because ir3 currently doesn't support (rpt), which means that some shaders are larger than one instrlen and the current logic has to be extended to allow for that. This seems a small price to pay, ir3 will gain support for (rpt) eventually, and we shouldn't have limitations like this baked in anyway. For example some GL blob r8g8 <-> r16 copy shaders are apparently quite large.
- Due to the inability to switch inputs/outputs on the fly, we need to split the VS into two variants. I made the layer-writing variant also used for other clears, because the old method of overloading c0.z/c1.z to mean both "src x coordinate" and "z clear value" in the same shader seemed too clever and I didn't want to add yet another variant. This means that non-layered clears will also write the layer (to 0), but that shouldn't be a big deal performance-wise.