S32NE support for remap operations
So far PulseAudio only supported two different work formats: S16NE
if
it's sufficient to represent the input and output formats without loss
of precision and FLOAT32NE
in all other cases. For systems that use
S32NE
exclusively, this results in unnecessary conversions from S32NE
to
FLOAT32NE
and back again.
Add S32NE
remap operations and make use of them (for the COPY
and
TRIVIAL
resamplers) if both input and output format are S32NE
. This
avoids the back and forth conversions between S32NE
and FLOAT32NE
,
significantly improving performance for those cases.
CPU usage was reduced to ~60% of the previous value after applying this series. Tested on a Wandboard Quad using the following script (for use with test-daemon.sh
):
#!/bin/sh
SERVER_PID="$(pgrep pulseaudio)"
if [ "$(pgrep -c pulseaudio)" -ne 1 ] || [ -z "${SERVER_PID}" ]; then
echo "Cannot determine PID of pulseaudio server"
fi
pactl load-module module-null-source format=s32ne rate=48000 source_name=s32ne-source > /dev/null || exit
pactl load-module module-remap-source master=s32ne-source channel_map=front-left,front-right master_channel_map=front-right,front-right source_name=s32ne-source-right > /dev/null || exit
pactl load-module module-remap-source master=s32ne-source channel_map=front-left,front-right master_channel_map=front-left,front-left source_name=s32ne-source-left > /dev/null || exit
pre_pacat_cpu="$(cut -d ' ' -f 14 < /proc/"${SERVER_PID}"/stat)"
timeout --preserve-status 60 pacat -r -d s32ne-source-right --fix-format --fix-rate --fix-channels > /dev/null || exit
post_pacat_cpu="$(cut -d ' ' -f 14 < /proc/"${SERVER_PID}"/stat)"
echo CPU usage: $((post_pacat_cpu - pre_pacat_cpu))
I don't really envision other work formats for the built-in remappers so
far so I've simply changed pa_set_remap_func()
to take three
arguments. S32NE
is about the same precision as FLOAT32NE
and thus
sufficient for all normal operations. The only non-integer work formats
besides FLOAT32NE
are ALAW
and ULAW
which are unlikely to be used by the
built-in remappers as they are compressed and would require special
algorithms. If anyone would want to do that a plug-in would be more
appropriate.
The existing S16NE
code got by with some operations that would
temporarily overflow int16_t
because PulseAudio only runs on
architectures that have at least 32-bit integers so there's an implict
promotion to 32-bit so this isn't a problem. For S32NE
we have to make
sure we don't overflow int32_t
. We can either do that by using 64-bit
arithmetic or by rearranging the calculations to avoid intermediate
results larger than int32_t
, at the cost of some minor loss of precision
(up to 2 LSB). For the sake of performance I've opted for the latter
approach.
On the NEON capable system I've tested this on (Wandboard Quad, i.MX6Q,
armhf) most hand-crafted NEON assembly operations for S32LE
would have
been slower than the corresponding generic C operations (GCC 6.3.0 on
Debian Stretch) so I've left them out deliberately.