v3d: add support for load_workgroup_size
This wires up the compiler and the gallium driver bits.
Sadly, I don't see a way for actually exposing this via GL, as GL compute shaders require 1024 threads and ARB_compute_variable_group_size
might be OpenGL only?
In any case, this will be required for OpenCL support as the workgroup size is variable by default (unless specified in the kernel).