intel/compiler/mesh: support longer write messages
Allowing longer writes reduces the number of send messages needed to support unaligned 4-component writes.
Note: nothing currently generates 8-component writes, so this change makes "second_mask" code path in emit_urb_direct_writes and emit_urb_indirect_writes_mod dead.
The first two commits are from MR !20050 (merged).