nir: intel-fs: Optimize subgroup intrinsics in uniform control flow
This MR primarily adds the optimizations described in #3731. As part of this, there are also a couple bug fixes for Intel driver.
I have not enabled the new optimization pass in ACO, but that should be pretty easy. There is some (untested) code to emit the mbcnt
instruction, but a flag needs to be added to nir_lower_subgroups_options
. Since Intel doesn't have write_invocation
, I didn't implement the min/max/and/or exclusive_scan
optimizations.