ntt: fix write mask for 64 bit to 32 bit conversions when dest is register
In this case there may actually be a write mask including z or w, and with the extra move that is inserted, we have to set the according write mask on the temporary, otherwise virgl will generate invalid GLSL. This fixes a bunch of dvec and dmat test with virgl on NTT.
In addition fix the 64 bit write mask in virgl to not include Z.