Draft: WIP intel: optimize compute workgroup sizes
For now only the upscaling part of #3083.
Tested with: https://github.com/marcinslusarz/mesa_3083_compute
Current numbers: https://people.freedesktop.org/~mslusarz/mesa3083/i965-mandelbrot500x500-baseline-vs-opt-enabled-20200806.html
Known regressions:
- piglit.spec.nv_compute_shader_derivatives.execution.*
- dEQP-VK.compute.indirect_dispatch.*
- dEQP-VK.conditional_rendering.dispatch.*
- dEQP-VK.synchronization.*
- dEQP-VK.subgroups.*
- dEQP-VK.query_pool.statistics_query.compute_shader_invocations.*
- dEQP-VK.query_pool.statistics_query.host_query_reset.*
TODO:
- fix regressions
- optimize indirect dispatch on iris and i965
- find real workloads benefiting from this MR