USB audio interfaces very commonly have channel mappings that do not match the physical inputs/outputs

Problem in few words

Many USB audio interfaces combine physical inputs/outputs into stereo/surround pairs when they should be completely separate channels. On inputs, this results in mono inputs being quieter than they should be due to the audio only being on one channel -- for example a cardioid microphone connected to a single port -- due to userspace applications typically taking microphone input as mono, and the downmixing of stereo->mono involving the pan law. On outputs, this results in devices that have two physical stereo outputs (eg. headphones and speakers) showing up as a 4.0 surround output.

Problem in many words

I have a Behringer U-Phoria UMC204HD USB interface. It has two XLR/3.5mm inputs for microphones or instruments, two stereo outputs on the back, and a headphone monitor output with a button to switch it between the two stereo outputs. The device looks like this:

In userspace applications the device is reported with (1) a 4.0 analog surround output with the first physical stereo output on the front-left and front-right channels, and the second output on rear-left and rear-right channels; and (2) a stereo analog input with the first physical mono input on the left channel and the second physical mono input on the right channel.

This is problematic, because it makes it difficult to link userspace applications to the correct physical inputs/outputs on the device. The user cannot easily, for example, route audio from one application to the 2nd stereo output linked to their speakers, and audio from another application to the 1st stereo output linked to their headphones.

Another major issue is that it results in input volume being low. If the user has a microphone connected to input 1, this will appear as a stereo device with audio on only the left channel. Because most userspace applications that take microphone input (eg. VoIP and video conferencing applications) expect mono, this stereo input gets downmixed to mono. As a result of the second channel being completely empty, this effectively reduces the volume of the microphone by up to about 6dB due to pan law.

I suspect there may also be a bug hiding somewhere that causes the pan law to be applied more than once, as with some configurations I have seen only a ~3dB decrease in volume. However, I have not done a great deal of testing and it could simply be some audio mixing convention I am not aware of (I am out of my depth in this field).

Why is this a problem?

USB audio interfaces like this are quite common these days even among consumers who do not use them for pro audio purposes, because integrated sound cards are often woefully low quality and prone to interference from internal PC components. Popular manufacturers include Motu, FiiO, Behringer and many others. They are often marketed towards (and used by) hifi enthusiasts in addition to pro audio users. It appears many, if not all, of them tend to combine inputs/outputs in this fashion, which makes this problem quite widespread.

I have found a few related issues on the bugtracker all of which seem to relate to the channel mapping on these kinds of interfaces not matching the physical inputs/outputs:

#1119
#1136 (closed)
#627 (closed)
#355 (closed) (the fix to this may have introduced the -6dB pan law issue to microphones on interfaces like this; see @Ckath's comment)

Some may argue that these being pro audio devices they should be using the Pro Audio profile and the user should route channels manually. I do not think this is a good faith argument; having to route channels manually to not have the microphone input be 6dB quieter than it should be is terrible UX, and you would need to do this separately for every application.

Optimal solution

In my opinion, the best solution to this from an user experience point of view would be if the inputs/outputs selectable in software matched the physical device. On my device with two stereo outs and two mono ins, this would mean having

A "front" stereo output for the first stereo out
A "rear" stereo output for the second stereo out
A "left" mono input for the left mono in
A "right" mono input for the right mono in

These should all be mixed in a way that does not incur the pan law effect.

These should be selectable in normal user applications, such as DE audio switchers, pavucontrol, web browsers, VoIP applications etc., not just in JACK style graph routing solutions.

I have tried implementing this using PulseAudio device profiles as suggested by @wtaymans in this comment, but the profile system does not seem to be sufficient for this; it was not designed to split channels and can not properly do it. Apparently there are also UCM profiles, but I have no idea how to create those.

Workaround

Currently I work around this issue by using the loopback module to create virtual devices that split the channels (example configuration to split the left channel of the stereo input into a mono input without reducing the microphone volume due to pan law). This is a suboptimal user experience because the configuration is difficult and currently not particularly well-documented.

Whose fault is this?

To my understanding these devices are grouping the physical channels together for simplicity or technical convenience. This has also been suggested by @saivert in this comment. Perhaps it could be argued that this is the manufacturers' fault, and perhaps that would be correct, but there's nothing we can do about this. Or perhaps it could be argued this is the user's fault for using pro audio equipment for purposes it wasn't intended for, but it's not like the user has much of a choice here; if you want multiple input or output channels on a single device, there's nothing on the market except for these devices.

It could also be ALSA's fault, but if ALSA is simply mapping the device in the way the device itself reports the channels, then I think ALSA is doing exactly what it should be doing. For what it's worth, here's what the Behringer UMC204HD looks like in alsamixer (I do not know of any better way to see how ALSA maps the device):

Regardless of whether it is PipeWire's fault, I think this is something that can be fixed in PipeWire if the device profile system is expanded to allow for splitting channels. However, if there's a way to split the channels "upstream", eg. by using UCM profiles or something, then the splitting should probably be done there.

Edited Jul 18, 2021 by Peter Wedder