nir,intel/compiler: fix task payload in shared memory workaround
On Intel hardware, user-addressable task payload starts at 32 bytes, but task-payload-in-shared-memory workaround is copying data back from shared memory to task payload starting at offset 0, so the last 32 bytes of task payload were left uninitialized. Fix this.
It's a mystery why none of the CTS tests was able to hit this, but it's very easy to reproduce: crucible!128 (merged).