lima: ppir: be more aggressive with cloning uniforms and load coords
Try more aggressive approach with cloning some loads. Uniform load can be inserted into any instruction, so let's do that. ARM site claims that penalty for cache miss is one clock, so we don't lose anything if we merge it into instruction that uses the result. As side effect we can also pipeline it (and thus reduce reg pressure)
Do the same for varyings that hold texture coords, but for different reason: looks like there's a special path that increases precision for coords if varying that holds it is pipelined. If we don't pipeline it and load coords from a register its precision is fp16 and thus only 10 bits which is not enough to accurately sample textures of size 1024 or larger.
Since instruction can hold only one uniform load and one varying load, node_to_instr now creates a move using helper introduced in previous commit if slot is already taken. As side effect of this change we can also try to pipeline texture loads and create a move if attempt fails.