Skip to content

microsoft/compiler: Lower fquantize2f16

Boris Brezillon requested to merge bbrezillon/mesa:ms-fquantize2f16 into main

As far as I can't tell, there's no native operation doing this equivalent of fquantize2f16. Let's lower this operation to

if (val < MIN_FLOAT16) return -INFINITY; else if (val > MAX_FLOAT16) return -INFINITY; else if (fabs(val) < SMALLER_NORMALIZED_FLOAT16) return 0; else return val;

which matches the definition of OpQuantizeToF16:

" If Value is an infinity, the result is the same infinity. If Value is a NaN, the result is a NaN, but not necessarily the same NaN. If Value is positive with a magnitude too large to represent as a 16-bit floating-point value, the result is positive infinity. If Value is negative with a magnitude too large to represent as a 16-bit floating-point value, the result is negative infinity. If the magnitude of Value is too small to represent as a normalized 16-bit floating-point value, the result may be either +0 or -0. "

Reviewed-by: Jesse Natalie jenatali@microsoft.com

(extracted from !15024 (closed))

Merge request reports