vtn: optimize opencl mad
per spec:
mad approximates a * b + c. Whether or how the product of a * b is rounded and how supernormal or subnormal intermediate products are handled is not defined. mad is intended to be used where speed is preferred over accuracy.
so fuse if we should
clpeak "float" case from 1112 -> 1978 GFLOPS on rusticl on m1.