intel: Generate multiple SIMD variants for CS
This allow us to have the maximum limits ARB_compute_variable_group_size
without forcing SIMD32 to be used all the time. By having multiple SIMD variants available, the GL drivers can pick the most appropriate based on the group size passed at dispatch time.
The MR contain a few improvements and cleanups for that area of the code before hitting the main goal.
One possible next step here is to use the performance analysis to select the SIMD variant (both in the fixed and variable) instead of the current heuristics.