-
Icecream95 authored
With this patch, GCC generates vectorized code that does the comparisons without converting the indices to 32-bit first. This optimization makes the aforementioned function almost twice as fast for ARM NEON, and should speed up vectorised code on other platforms. Without vectorisation, the function is still a percent or two faster, but slightly larger. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <mesa/mesa!3050>
80aca968