intel/compiler: fix derivative on y axis implementation
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute. Fixes the following CTS tests : dEQP-VK.glsl.derivate.dfdyfine.dynamic_* Signed-off-by:Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-Authored-by:
Jason Ekstrand <jason@jlekstrand.net> Reviewed-by:
Matt Turner <mattst88@gmail.com> Fixes: 2134ea38 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+") (cherry picked from commit 83622584)