Skip to content

S32NE support for remap operations

So far PulseAudio only supported two different work formats: S16NE if it's sufficient to represent the input and output formats without loss of precision and FLOAT32NE in all other cases. For systems that use S32NE exclusively, this results in unnecessary conversions from S32NE to FLOAT32NE and back again.

Add S32NE remap operations and make use of them (for the COPY and TRIVIAL resamplers) if both input and output format are S32NE. This avoids the back and forth conversions between S32NE and FLOAT32NE, significantly improving performance for those cases.

CPU usage was reduced to ~60% of the previous value after applying this series. Tested on a Wandboard Quad using the following script (for use with test-daemon.sh):

#!/bin/sh

SERVER_PID="$(pgrep pulseaudio)"
if [ "$(pgrep -c pulseaudio)" -ne 1 ] || [ -z "${SERVER_PID}" ]; then
    echo "Cannot determine PID of pulseaudio server"
fi

pactl load-module module-null-source format=s32ne rate=48000 source_name=s32ne-source > /dev/null || exit
pactl load-module module-remap-source master=s32ne-source channel_map=front-left,front-right master_channel_map=front-right,front-right source_name=s32ne-source-right > /dev/null || exit
pactl load-module module-remap-source master=s32ne-source channel_map=front-left,front-right master_channel_map=front-left,front-left source_name=s32ne-source-left > /dev/null || exit
pre_pacat_cpu="$(cut -d ' ' -f 14 < /proc/"${SERVER_PID}"/stat)"
timeout --preserve-status 60 pacat -r -d s32ne-source-right --fix-format --fix-rate --fix-channels > /dev/null || exit
post_pacat_cpu="$(cut -d ' ' -f 14 < /proc/"${SERVER_PID}"/stat)"
echo CPU usage: $((post_pacat_cpu - pre_pacat_cpu))

I don't really envision other work formats for the built-in remappers so far so I've simply changed pa_set_remap_func() to take three arguments. S32NE is about the same precision as FLOAT32NE and thus sufficient for all normal operations. The only non-integer work formats besides FLOAT32NE are ALAW and ULAW which are unlikely to be used by the built-in remappers as they are compressed and would require special algorithms. If anyone would want to do that a plug-in would be more appropriate.

The existing S16NE code got by with some operations that would temporarily overflow int16_t because PulseAudio only runs on architectures that have at least 32-bit integers so there's an implict promotion to 32-bit so this isn't a problem. For S32NE we have to make sure we don't overflow int32_t. We can either do that by using 64-bit arithmetic or by rearranging the calculations to avoid intermediate results larger than int32_t, at the cost of some minor loss of precision (up to 2 LSB). For the sake of performance I've opted for the latter approach.

On the NEON capable system I've tested this on (Wandboard Quad, i.MX6Q, armhf) most hand-crafted NEON assembly operations for S32LE would have been slower than the corresponding generic C operations (GCC 6.3.0 on Debian Stretch) so I've left them out deliberately.

Merge request reports