nir: Incorrect idiv lowering

NIR has two implementations of lower_idiv, keyed on the imprecise_32bit_lowering flag. This flag is misleading: the results when setting this flag "imprecise", they're completely wrong for some values. If a backend has a native implementation of umul_high, the correct path isn't that much more expensive. If it doesn't, it's substantially slower for highp integer divison... but in practice, non-constant highp integer division is pretty rare.

Drivers need to stop using the incorrect idiv lowering so we can delete it and only have correct code in the core.

If your driver is on the list, you should stop setting imprecise_32bit_lowering when calling nir_lower_idiv, and possibly optimize the resulting code (see !17266 (merged) for example) to mitigate the shader-db regression. If you don't, I will, but won't be much help with the optimizations 😉 Here are some hints:

The accurate idiv path uses umul_high. If you have a native instruction to do 32x32 -> 64-bit multiplies, you really want to use it for this. Otherwise you need to set lower_mul_high in your compiler options and call nir_lower_alu after nir_lower_idiv. That will generate uadd_carry instructions in turn, so you should also set lower_uadd_carry (and lower_usub_borrow for good measure).
The division lowering generates b2i32(comparison) sequence. If your platform has an efficient way to implement this, you can save some instructions using it. Mali has a "0/1 boolean" mode for its comparison instructions (instead of the usual "0/~0" mode). AGX has a four-source comparison-and-select instruction which can emulate the same.

Original issue

panfrost, turnip, v3dv: possibly incorrect use of imprecise_32bit_lowering

All of these drivers apply imprecise_32bit_lowering for any division operation, but vulkan only allows it for operations decorated with RelaxedPrecision, at least as far as I understand (I wasn't able to find appropriate wording in spir-v or vulkan specs.) Gallium drivers that use the same compilers may also share this issue when dividing highp integers.

Edited Oct 25, 2022 by Georg Lehmann

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information