microsoft/compiler: Lower fquantize2f16
As far as I can't tell, there's no native operation doing this equivalent of fquantize2f16. Let's lower this operation to
if (val < MIN_FLOAT16) return -INFINITY; else if (val > MAX_FLOAT16) return -INFINITY; else if (fabs(val) < SMALLER_NORMALIZED_FLOAT16) return 0; else return val;
which matches the definition of OpQuantizeToF16:
" If Value is an infinity, the result is the same infinity. If Value is a NaN, the result is a NaN, but not necessarily the same NaN. If Value is positive with a magnitude too large to represent as a 16-bit floating-point value, the result is positive infinity. If Value is negative with a magnitude too large to represent as a 16-bit floating-point value, the result is negative infinity. If the magnitude of Value is too small to represent as a normalized 16-bit floating-point value, the result may be either +0 or -0. "
Reviewed-by: Jesse Natalie jenatali@microsoft.com
(extracted from !15024 (closed))