gallivm: add load/store scratch support.
Scratch space is per-thread space, so allocate the scratch size
- vector width, and add a per-thread base offset to each load/store.
This is needed for OpenCL private memory space
Reviewed-by: Roland Scheidegger sroland@vmware.com