-
Alyssa Rosenzweig authored
There are 4 distinct cases of fsqrt: 1. FP16 on Bifrost Here we may use lower to `x * rsqrt(x)` with a .left modifier on the FMA_RSCALE.v2f16 used to carry out the multiplication, ensuring correct handling of NaN and Inf. 2. FP32 on G71 Missing FRSQ.f32 instruction, do something simple since we don't even probe the driver on G71... 3. FP32 on G72 and newer We can do the same lowering as FP16 in theory. However, this may have precision issues. The DDK uses extra FREXPM/FREXPE instructions in a .sqrt mode for a range reduction. It's unknown if this is necessary for OpenGL (ES), Vulkan, OpenCL, or some combination thereof. 4. FP16 on Valhall We want to use the same strategy as on Bifrost, but Valhall removed the FMA_RSCALE.v2f16 instruction. Instead we use an ordinary FMA.v2f16 for the multiply and check the special case explicitly with CSEL.v2f32 ... I'm not sure if this is right for infinity but the DDK does it so ¯\_(ツ)_/¯ Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
26e69b73