NEON fp16 conversions
With the mediump uniforms support coming in in !9050 (merged), it would be good to do accelerated f32-to-f16 conversions on ARM. I think that looks like:
- A new NEON half-float conversion file compiled with
-mfpu=neon-fp16 -mfloat-abi=softfp
(assuming the c compiler probes as able to use those args) - Implementation uses (for example) https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Half-Precision.html to convert the float to a
__fp16
and store to memory, thenmemcpy
that tou16
and return it. -
_mesa_float_to_half()
grows autil_cpu_caps
check to dispatch to a call to the accelerated implementation. - Add the new cap to the
has_f16c
check inu_half_test.c
I don't think we can fold it into half_float.h
file like x86_64 does because I'm pretty sure regardless of the implementation (intrinsics or raw asm), we're going to need to have special compiler args to enable it (that was my experience with neon on vc4).
Long term, we probably would want a vec4-to-f16vec4 path as well so we can do a pile of them at once.
Tagging freedreno because I suspect we want to use mediump uniforms on a6xx so we can turn off CONSTANT_DEMOTION_ENABLE and save a bunch of constbuf space, but I bet we don't want to do it until we have faster uniform updates.