Skip to content
Commit e13c9969 authored by Link Mauve's avatar Link Mauve Committed by Derek Foreman
Browse files

libweston: Optimise matrix multiplication



The previous version used div() to separate the column and row of the
current element, but that function is implemented as a libc call, which
prevented the compiler from vectorising the loop and made matrix
multiplication appear quite high in profiles.

With div() removed, we are down from 64 calls to vfmadd132ss acting on
one float at a time, to just 8 calls to vfmadd132ps when compiled with
AVX2 support (or 16 mulps, 16 addps with SSE2 support only), and the
function isn’t a hot spot any more.

Signed-off-by: default avatarEmmanuel Gil Peyrot <linkmauve@linkmauve.fr>
parent 102acac6
Loading
Loading
Loading
Pipeline #781816 passed with stages
in 1 minute and 59 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment