NEON fp16 conversions

With the mediump uniforms support coming in in !9050 (merged), it would be good to do accelerated f32-to-f16 conversions on ARM. I think that looks like:

A new NEON half-float conversion file compiled with -mfpu=neon-fp16 -mfloat-abi=softfp (assuming the c compiler probes as able to use those args)
Implementation uses (for example) https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Half-Precision.html to convert the float to a __fp16 and store to memory, then memcpy that to u16 and return it.
_mesa_float_to_half() grows a util_cpu_caps check to dispatch to a call to the accelerated implementation.
Add the new cap to the has_f16c check in u_half_test.c

I don't think we can fold it into half_float.h file like x86_64 does because I'm pretty sure regardless of the implementation (intrinsics or raw asm), we're going to need to have special compiler args to enable it (that was my experience with neon on vc4).

Long term, we probably would want a vec4-to-f16vec4 path as well so we can do a pile of them at once.

Tagging freedreno because I suspect we want to use mediump uniforms on a6xx so we can turn off CONSTANT_DEMOTION_ENABLE and save a bunch of constbuf space, but I bet we don't want to do it until we have faster uniform updates.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information