i965/vec4: Handle destination writemasks in VEC4_OPCODE_PACK_BYTES.
Since pack_bytes expands to two mov(4) align1 instructions, we can't use swizzles directly. For an instruction like pack_bytes m4.y:UD, vgrf13.xyzw:UD we can write into the .y component by settings the offset based on the swizzle. Also while we're doing this, we can set the dependency control hints properly, so that a series of pack_bytes writing into separate components of a register can issue without blocking.
Showing with 13 additions and 2 deletions