intel/compiler: fix derivative on y axis implementation
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute.
Fixes the following CTS tests :
dEQP-VK.glsl.derivate.dfdyfine.dynamic_*
Signed-off-by: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: mesa-stable@lists.freedesktop.org