WebRTC echo cancellation sample rate seems limited to 32000Hz
Summary
I am developing a user space daemon that maintains echo cancellation between a master source and sink, and that among other things dynamically unloads and re-applies the echo cancellation when the available sources or sinks change.
As far as I am concerned, echo cancellation can kick in as soon as a source device becomes available, for example when I plug in a headset.
Or, even more boldly, if there is no source device, the daemon supports creating a null source and using that as echo cancellation source master, which has the benefit that the fallback sink and source always have the same name (the daemon sets the virtual source and sink created by the echo cancellation as defaults).
What I most decidedly do not want anymore is to e.g. start a game that has voice chat and then discover that it is not using the virtual echo cancellation devices because it "remembered" the physical devices, so that I have to tab out of the game to move its streams to the echo-cancellation source and sink with pavucontrol
.
Which is not even possible if the game has hold-key-to-speak voice chat, because the recording stream only exists while the key is held down, and pavucontrol
drops streams from its UI immediately when they close. (Might warrant a feature request for pavucontrol
to make closed streams linger in the UI for a few seconds.)
Point being, my filthy smartphone has reliable automagic echo cancellation, and I am disinclined to settle for anything less with my FOSS desktop.
The problem with the "echo-cancellation-always-on" approach is that it degrades sink quality no matter whether I use webrtc
or speex
or adrian
. In particular, speex
and adrian
create a virtual single-channel sink, so everything is downmixed to mono, which won't do. The most promising one is webrtc
. With use_master_format=1
it keeps all channels, but the best sample rate I can get is 32000Hz.
Here is an example.
Before applying echo cancellation:
[myuser@mysystem ~]$ pactl list short sinks
9 alsa_output.pci-0000_00_1b.0.analog-surround-50 module-alsa-card.c s16le 5ch 44100Hz RUNNING
[myuser@mysystem ~]$ pactl list short sources | grep -v ".monitor"
(no output)
Then a null source src_dummy
is created and the echo cancellation is applied as such (device descriptions omitted):
pactl load-module module-null-source source_name="src_dummy"
pactl load-module module-echo-cancel aec_method="webrtc" use_master_format=1 \
aec_args="analog_gain_control=0\\ digital_gain_control=1\\ experimental_agc=1\\ noise_suppression=1\\ voice_detection=1\\ extended_filter=1" \
source_master="src_dummy" source_name="src_main" \
sink_master="alsa_output.pci..." sink_name="sink_main"
Result:
[myuser@mysystem ~]$ pactl list short sinks
9 alsa_output.pci-0000_00_1b.0.analog-surround-50 module-alsa-card.c s16le 5ch 48000Hz RUNNING
19 sink_main module-echo-cancel.c float32le 5ch 32000Hz RUNNING
[myuser@mysystem ~]$ pactl list short sources | grep -v ".monitor"
39 src_dummy module-null-source.c s16le 2ch 44100Hz IDLE
40 src_main module-echo-cancel.c float32le 2ch 32000Hz SUSPENDED
Notice that the virtual echo cancellation sink sink_main
only has a sample rate of 32000 Hz, whereas the echo cancellation sink master above it has 48000 Hz. It is also audible.
(Looking at this I just now notice that the physical sink's sample rate has increased from 41000 Hz to 48000 Hz. What's up with that?)
Environment
PulseAudio 13.0, pulseaudio 13.0-3
, libpulse 13.0-3
Arch Linux, Linux mysystem 5.7.11-arch1-1 #1 SMP PREEMPT Wed, 29 Jul 2020 21:38:21 +0000 x86_64 GNU/Linux
GNOME Shell 3.36.4 (Wayland), gnome-desktop 1:3.36.4-1
, gnome-shell 1:3.36.4-1
, mutter 3.36.4-1
Steps to reproduce
Enable echo cancellation as described using a sink master having a sample rate greater than 32000 Hz.
What is the current bug behavior?
The echo cancellation module uses an insufficient sample rate, even though user_master_format=1
is specified as argument for module-echo-cancel
.
What is the expected correct behavior?
The echo cancellation uses the same number of channels and the same sample rate as the master sink.