nir: Use a float approximation for 64-bit idiv lowering
The current code we have for 64-bit integer division lowering in nir_lower_int64()
is terrible. It literally implements the division algorithm with a loop. The method used by nir_lower_idiv()
is way better. It first approximates with float division and then refines the result. We should implement 64-bit idiv lowering via the same mechanism. It'll be way faster.
To get even better, we probably want to do a cheap-and-dirty i2f
implementation instead of the one in nir_lower_int64() which has correct rounding.