ir3: Use SWZ
Plumb through support for multi-mov instructions, which behave sort of like a repeated move but can read and/or write an unrelated register each cycle. Then use
swz to replace the xor trick when lowering copies after RA, which uses fewer cycles and nop's. The infrastructure is also in place to use
gat, but that is left for later. shader-db results are in the last commit.