echo-cancel: not friendly to dynamic environments, UX issues

First of all, this is not really a technical ticket, but more an invitation to rethink the UX issues, to avoid repeating the mistake of baking UX mistakes into the API.

I have played with pipewire-module-echo-cancel on my desktop, by including this into /etc/pipewire/media-session.d/media-session.conf:

context.modules = [
    ...
    {   name = libpipewire-module-echo-cancel
        args = {
            source.props = {
                node.name = "echo-cancel.source"
                node.description = "Echo Cancelled Mic"
            }
            sink.props = {
                node.name = "echo-cancel.sink"
                node.description = "Echo Cancelled Output"
            }
        }
    }
]

This works OK in static setup with one microphone and one set of speakers, but this is only one simplistic case. Here are my objections to the current design:

Let's suppose that sometimes someone connects a second USB microphone (with supposedly better quality). But, because the number of echo-cancel devices is configured statically in the configuration file, this means that echo cancellation can be done only on one of them. Usually this is not an issue because one can switch the echo-cancel input to the correct microphone, and so far I didn't have a situation when I need two echo-cancelled microphones, but still, this feels suboptimal. Echo cancelers should be created dynamically for all microphones.
The whole notion of "THE sink that this microphone is echo-cancelled with" looks like a technical limitation. Currently, this is used as a way of routing the output of the apps that play to the echo-cancelled sink, and the microphone does pick up sounds made by other apps. There is no such notion in Windows, just a checkbox in the microphone settings. And I definitely don't want notifications that someone in my contact list has logged into Skype to be heard by the participants of an active Google Meet session. From the physical perspective, all microphones pick up sounds made by all speakers, so there is no "the", and speakers can appear and disappear at any time as well. And it makes more sense to me to echo-cancel all microphones with all currently active speakers, with the extra appeal that this would be a zero-settings solution.

The following questions should be considered:

What is the use case for allowing apps to opt out their playback streams from being echo-cancelled?
If the answer to question 1 is "none", why does the echo canceler present itself as a sink at all? Why can't it use the monitor source of each speaker? And, the "intelligibility enhancer" (that can modify the playback data) being part of WebRTC is not the answer, because this kind of processing can be separated.
If the answer to question 1 is not "none", what would be a useful way to decouple routing the echo-cancelled app eventually to the sink X, and echo-canceling it with sinks X, Y, and Z (currently impossible)?

I understand that I am combining two enhancement request into one issue, and they can individually be valid or invalid. Feel free to split. I also understand that it is not really possible to implement this without breaking PulseAudio compatibility in terms of module-echo-cancel arguments (the "THE master sink" issue is baked into the API).

Edited Jun 30, 2021 by Alexander Patrakov

Admin message

echo-cancel: not friendly to dynamic environments, UX issues