bayer2rgb: vectorized horiz_upsample cannot be compiled on arm7a
Submitted by Robin Haberkorn
Created attachment 341096
Log of running a pipeline with bayer2rgb and ORC_DEBUG=10
I've filed this for v1.4.5 (since this is what I'm using now), but I checked upstream. There is no difference in the relevant code.
The version of liborc used is 0.4.23, but again there does not seem to be any commit upstream that could affect this.
When running a pipeline with bayer2rgb and ORC_DEBUG=10 on an embedded platform (armv7a), bayer_orc_horiz_upsample() cannot be compiled, so orc will use the fallback C version (see orc-log.txt).
I've patched gstbayer2rgb.c, so the alternative bayer_orc_horiz_upsample_unaligned() is used on ARMv7a (see attached patch). This JIT-compiles fine. But frankly, I do not understand the difference between the two variants and why this did the trick. There are no code comments and no explanations. Perhaps one of the maintainers could elaborate on this.
To my even bigger surprise, using the vectorized bayer_orc_horiz_upsample_unaligned() seems to bring no significant speed increase versus the fallback bayer_orc_horiz_upsample() version. But with about 90MB/s of debayered buffers being produced by my pipeline, I might also have run into a memory bandwidth bottleneck here (although the bottleneck would be unlikely low) or into some other bottleneck of course.
In general, the liborc vectorizations do speed up things at least 1,7 times.
Attachment 341096, "Log of running a pipeline with bayer2rgb and ORC_DEBUG=10":