RFC/RFH: Implementing missing bits to improve phone call routing
TL;DR: I'd like to improve phone call management in PipeWire+WirePlumber, but would appreciate comments about my plans and some help getting started on that...
Introductory note
I've been reflecting about improving the management of voice call routing (e.g. for Linux-based phones) for a while, and am facing a kind of "blank page syndrom": I have a rather complete high-level view, some ideas about where and how to implement those, but unable to determine how I can get started.
This issue is as much a request for comments about how I envision it as it is a cry for help, or at least basic guidance so I can finally get the ball rolling. It focuses on the PipeWire side of things, but WirePlumber will also be involved in a follow-up phase.
Brief summary of the current situation
A few years ago, when we started working on/with the PinePhone, we needed a way to automate audio routing changes for phone calls; unfortunately, this couldn't be done within PulseAudio for $reasons and I came up with callaudiod.
It's a simple PulseAudio client essentially handling 2 things:
- trigger profile changes for calls (default ->
VoiceCall
and vice-versa) - select the appropriate output port (highest priority by default,
Earpiece
orHandset
in calls if such ports exist in the corresponding profile)
It worked fine (or rather, well enough) when using PulseAudio as the sound server, but since most of the world switched to PipeWire, callaudiod
is often racing with WirePlumber, causing all sorts of problems (mostly of the "wrong port selected" class) depending of which software ends up winning the race.
In order to solve this situation, my plan is to implement the main callaudiod functionality into WirePlumber itself, so we no longer have 2 competing pieces of software trying to select the most appropriate profile and output port.
What is present and missing in PipeWire
My main idea is that PipeWire could provide a set of global metadata providing the following information:
- is a voice-capable modem present on the system
- is a voice call incoming or in progress
- if a modem is present, is it available as a separate audio interface and what's this iface name
WirePlumber would then use this data to trigger routing changes events when a voice call starts/ends, and possibly (let's call that a "stretch goal") create loopback connections between the main audio device and the modem.
PipeWire, through the Bluetooth SPA plugin, already monitors the state of phone calls; therefore, it would make sense to split this functionality into a separate plugin/module and extend it to create a global metadata object. And that's where I'm entering an unknown (and a bit frightening) territory...
Implementation strategy
Interaction with the BlueTooth plugin
As already mentioned, I'd like to avoid duplicating the functionality of monitoring the state of the modem as exposed by ModemManager; this should therefore be provided by an independent SPA plugin.
I feel none of the existing SPA interfaces would be a good fit for that, so my intention would be to create a new "Modem" interface type, which would provide the methods needed by the BT plugin:
- start/answer/hangup call
- send DTMF tone
It would also provide events for new/ended calls, for example.
Having such an interface would also allow implementing separate plugins for each modem backend (ModemManager, ofono), although I'll stick to ModemManager only in my implementation.
Modem metadata
I admit not having done much research on that part regarding implementation. I (naively?) assume a plugin could easily create/export a global metadata object. From a brief look, though, it seems creating such an object is quite complex, making me unsure this would be the most appropriate "format".
Further possible improvements
The most obvious following step would be to implement auto-loopback when the device actually exposes the modem as a separate audio interface, as it's the case with the Librem 5.
This might also be applied to Qualcomm-based phones, in which the modem is an ALSA sub-device of the main sound card: the actual routing is done by the DSP, but no audio can be heard on either side of the call unless a process constantly reads to/from this subdevice during phone calls.
It might also be interesting to expose a "virtual" modem audio device on other systems, so userspace can rely on an audio device representing the modem to always be present (if a modem is present, of course).
Bootstrapping
This whole plan makes sense to me, but I'd welcome other (or similar) opinions, the end goal being to address the problem of phone calls in a sane and hopefully future-proof way.
Also, every time I try to get started on that, I start by browsing through the existing code for "inspiration" and end up feeling completely lost with no clue what to do and how/where to start. The SPA modem interface feels like a good candidate, but so could be creating a basic plugin monitoring ModemManager (although it seems to me that even "a basic plugin" might not be so simple). Or maybe I'm missing a more obvious and/or easier starting point?