nir: Incorrect idiv lowering
NIR has two implementations of lower_idiv, keyed on the imprecise_32bit_lowering
flag. This flag is misleading: the results when setting this flag "imprecise", they're completely wrong for some values. If a backend has a native implementation of umul_high, the correct path isn't that much more expensive. If it doesn't, it's substantially slower for highp integer divison... but in practice, non-constant highp integer division is pretty rare.
Drivers need to stop using the incorrect idiv lowering so we can delete it and only have correct code in the core.
-
panfrost/midgard !17860 (merged) -
panfrost/bifrost !17266 (merged) -
asahi !17861 (closed) -
freedreno/ir3 !18085 (merged) -
v3d !17871 (merged) -
vc4 !18019 (merged) -
etnaviv !18080 -
r600/sfn !18154 (merged)
If your driver is on the list, you should stop setting imprecise_32bit_lowering
when calling nir_lower_idiv
, and possibly optimize the resulting code (see !17266 (merged) for example) to mitigate the shader-db regression. If you don't, I will, but won't be much help with the optimizations
- The accurate idiv path uses
umul_high
. If you have a native instruction to do 32x32 -> 64-bit multiplies, you really want to use it for this. Otherwise you need to setlower_mul_high
in your compiler options and callnir_lower_alu
afternir_lower_idiv
. That will generateuadd_carry
instructions in turn, so you should also setlower_uadd_carry
(andlower_usub_borrow
for good measure). - The division lowering generates b2i32(comparison) sequence. If your platform has an efficient way to implement this, you can save some instructions using it. Mali has a "0/1 boolean" mode for its comparison instructions (instead of the usual "0/~0" mode). AGX has a four-source comparison-and-select instruction which can emulate the same.
Original issue
panfrost, turnip, v3dv: possibly incorrect use of imprecise_32bit_loweringAll of these drivers apply imprecise_32bit_lowering for any division operation, but vulkan only allows it for operations decorated with RelaxedPrecision, at least as far as I understand (I wasn't able to find appropriate wording in spir-v or vulkan specs.) Gallium drivers that use the same compilers may also share this issue when dividing highp integers.