intel/compiler: Move pln emul to the fs_visitor.
Move the pln emul code to the fs_visitor, so we get some optimizations that don't happen at the fs_generator level, mainly better scheduling.
One big caveat of this change is that we don't use NF types and the accumulator anymore, but apparently we don't need the extra precision.