aarch64 neon support (cont'd)
This extends MR !37 (closed) and implements most of the remaining Aarch64 instructions. This is enough to accelerate e.g. videotestsrc, accumulator and flags2d are also implemented.
The videoconvert from I420 to RGB (useful e.g. for jpegdec and openh264dec) acceleration is not implemented due to missing loadupdb implementation. This will be added in separate MR.