nir/glsl: Optimize multiply instruction on gen8+
Starting on gen8, the "mul" instruction supports multiplying two 32 bit integers and putting the result into a 64bit dest. So currently, to do 64x64->64, we lowered it into 4 "mul" instructions, but having 64 bit dest, will allow us to get rid of 1 multiply instruction, which will make things hopefully little bit faster on gen8/9.
There are no test in shader-db which uses either [u/i]mulExtended or int64 bit operations to see reduction in instructions. But I ran couple of tests locally and checked the instruction count.
Thanks to Matt for helping me, and discussing possible approaches on this project. (and also Thank you Jason and Rafael for coming up with compiler side task for me.)