Audio indicator protocol

Why not just associate a pipewire stream with a wl_surface? Both on the video and audio side. That would be a benefit for gamescope and applications wishing to screen share with audio.

If a compositor wants to indicate a window is playing audio, it's as simple as checking to see if a stream is active.

How would you do that? And what if the application is not using pipewire?

Allowing the application to optionally provide a number of pipewire or pulseaudio identifiers such that the compositor can have the ability to mute a window would be useful.

How would you do that?

That's what we're here to discuss.

And what if the application is not using pipewire?

Then quite frankly, it doesn't matter. Wayland and PipeWire are so interlinked that you can't use Wayland without PipeWire being available. Your only problem then is getting some PipeWire stream or the like out of your PulseAudio or whatever app to connect to Wayland, which is probably fairly easy.

Wayland and PipeWire are so interlinked that you can't use Wayland without PipeWire being available.

Wayland works fine without pipewire.

Your only problem then is getting some PipeWire stream or the like out of your PulseAudio or whatever app to connect to Wayland, which is probably fairly easy.

How would an application that is written against the pulseaudio or even alsa api get a pipewire identifier?

How would an application that is written against the pulseaudio or even alsa api get a pipewire identifier?

I think this should simply be optional, any such interface cannot work in all scenarios. But attaching such an identifier (if available) could offer interesting integration of the desktop environment, e.g. shortcuts only affecting the audio of the focused window.

However pulseaudio or pipewire are both reasonable choices here, which could be expressed via an enum. (And are in the case of pipewire-pulse probably interchangeable, but managing that logic falls the compositor/DE in question.)

The bigger question is how would an application with multiple streams work here? Just announce all of them? (Assuming Desktop Environments likely want to use these for behaviour effecting all of a windows audio.) Should they have an identifier or human readable name for UI?

Wayland works fine without pipewire.

Not without missing screensharing in most apps.

How would an application that is written against the pulseaudio or even alsa api get a pipewire identifier?

pipewire-pulse, or pipewire-alsa. Those build on top of Pipewire. It seems reasonable you can get something for Pipewire from there.

I think the only sense in which Wayland currently "requires" Pipewire is that xdg-desktop-portal uses it for screen capture. So you need pipewire (or some desktop environment specific mechanism) for screen capture. But even that doesn't impose any requirement to use Pipewire for sound.

One notable issue with just using a Pipewire stream id: Pipewire currently recommends applications continue using pulseaudio/alsa/jack APIs instead of using pipewire directly for sound: https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/FAQ#what-audio-api-do-you-recommend-to-use. So I'm not sure clients would have a stream ID to provide, even if they're ultimately using Pipewire through one of these APIs.

And I guess along with Pipewire and PulseAudio, I'm not sure how many clients these days use libalsa. Or on BSD, I guess OSS or sndio. And there's Jack.

Just allow apps to give either Pulse or Pipewire identifiers, there's really no reason to make compromises here.

The bigger question is how would an application with multiple streams work here? Just announce all of them? (Assuming Desktop Environments likely want to use these for behaviour effecting all of a windows audio.)

If it could only pick one, the app would have to make a decision about which one... so, all of them, perhaps with some flags about foreground vs background media for distinguishing between background tabs vs the active one?

Should they have an identifier or human readable name for UI?

A human readable name can't hurt, as an optional thing

I think this should simply be optional, any such interface cannot work in all scenarios. But attaching such an identifier (if available) could offer interesting integration of the desktop environment, e.g. shortcuts only affecting the audio of the focused window.

I'm of the opinion the opposite should be the case. Having the compositor detect when an app is playing audio directly via the audio stack would be a lot more flexible than having apps say themselves.

One notable issue with just using a Pipewire stream id: Pipewire currently recommends applications continue using pulseaudio/alsa/jack APIs instead of using pipewire directly for sound:

likely not a problem, so long as those APIs are still built on top of Pipewire. The clients would just have to do some slight interfacing with PW to get the relevant IDs out of pipewire-pulse, pw-alsa, or whatever they're using.

Alternatively, for clients with several streams, make it a sink - video would ideally be provided as well, in gamescope's case.

Having the compositor detect when an app is playing audio directly via the audio stack

The audio stack knows about as much about Wayland surfaces as Wayland does about audio streams. Which is to say, nothing at all. The app has to connect the two, there's no way around that.

The audio stack knows about as much about Wayland surfaces as Wayland does about audio streams. Which is to say, nothing at all. The app has to connect the two, there's no way around that.

And when the app connects the two, the compositor can bug the audio stack to see if any audio is playing there, without having Wayland involved, saying "X surface is playing audio"

Yep, that makes sense because then the compositor doesn't have to trust the client. This is the way it is designed in Windows with the cooperative hwnd for an audio stream.

Associating a pw stream with a surface makes a lot of sense. That would be very useful for VR and doing spatial audio too.

And when the app connects the two, the compositor can bug the audio stack to see if any audio is playing there, without having Wayland involved, saying "X surface is playing audio"

You have wayland involved anyway to establish the connection in the first place. There is absolutely no harm to allow the protocol to either provide a pipewire-stream-id - if available - or not.

Forcing pipewire only means that applications not using pipewire (a lot of them - including many games - are still using ALSA through compatibility layers) will not be able to use the interface.

If you need those to work, put them into a sandbox and build something like the flatpak-dbus-proxy for audio to get the stream-ids.

I completely agree that this is useful and applications should provide it. But the more flexible protocol allows both and still wouldn't be super complex.

Also pipewire is quite linux specific as @ids1024 already pointed out.

Forcing pipewire only means that applications not using pipewire (a lot of them - including many games - are still using ALSA through compatibility layers) will not be able to use the interface.

In the end, they're most likely going to be using PipeWire. That's what matters. They don't need to port to it. If they support Wayland but not PipeWire, such as by way of SDL, then SDL can probably also be the one to hook up audio and video as well.

If you need those to work, put them into a sandbox and build something like the flatpak-dbus-proxy for audio to get the stream-ids.

Not mutually exclusive.

Also pipewire is quite linux specific as @ids1024 already pointed out.

If you have a better audio system that is sandbox-friendly, go for it. But I wouldn't have problems with making this protocol able to support other sound systems, should they exist, just not anything other than PipeWire right now.

In the end, they're most likely going to be using PipeWire. That's what matters.

No it doesn't. If they use pipewire-alsa they will have no way to receive a stream id.

If they support Wayland but not PipeWire, such as by way of SDL, then SDL can probably also be the one to hook up audio and video as well.

Even if SDL picks this up eventually, the vast majority of games won't be recompiled against a newer SDL version. Legacy software exists, whether you like it or not.

And that is also exactly the same reason there is no need to not be backwards compatible here. We are already asking developers to update their windowing code to wayland, we shouldn't try to enforce a specific audio stack. Another example would be DAW software being build against JACK, which works perfectly fine.

Of course they could just not add the protocol, but that would be a pretty unnecessary limitation.

should they exist

They do exist. I am already counting four different apis/solutions. And pulseaudio is just as sandboxing capable as is pipewire..

In the end, I'm thinking of a "media association" protocol that takes full advantage of modern libraries and the like, with audio and video stream association, not something that caters to old program audio that would be trivially manageable in said protocol too. And I think that has more of a chance of being utilized than a bare minimum, builds with older technologies, protocol.

And I think that has more of a chance of being utilized than a bare minimum

A protocol that is easy to implement for existing clients playing audio has a much higher chance of being utilized than a protocol that requires all of these applications to switch their audio stack from alsa or pulseaudio or some cross-platform library to pipewire.

I'm not sure why you think that creating more work for clients will make them more likely to pick this up.

A protocol that is easy to implement for existing clients playing audio has a much higher chance of being utilized than a protocol that requires all of these applications to switch their audio stack from alsa or pulseaudio or some cross-platform library to pipewire.

As I've stated two or three times now, ideally, they wouldn't be doing that - they would request pw-alsa or pw-pulseaudio to hand them access to the underlying pipewire resources from their current audio API and utilize those in hinting Wayland in what surface is paired with what audio streams.

they would request pw-alsa or pw-pulseaudio to hand them access to the underlying pipewire resources from their current audio API

These applications are linked against the native pulseaudio libraries and are not aware that the server they are connecting to has anything to do with pipewire. It just so happens that pipewire exposes sockets that speak a compatible wire protocol.

Browsers have the ability to show an icon in tab to indicate that the tab is playing sound. Wayland should expose a protocol to allow an application to indicate that a window is playing sound.

I like that idea quite a lot. This would enable Cosmic's stacks to show such an indicator similar to browsers.

I'll also note that desktops can see if an app is playing audio thanks to the Flatpak sandbox, maybe if some stuff is hooked up in Flatpak for that. But being able to associate surfaces with audio streams would help a lot.

How does the flatpak sandbox help if the app is punching thru to get to the pipewire socket?

This is assuming it's something more limited, akin to how security-context works with Flatpak, where the Wayland socket is duplicated and filtered. Though I'm not sure if filtering a PipeWire socket in Flatpak would allow the addition of input devices as well via a portal. In any case, it would be a given that it would be a portal or similar interface, not going right through to the native pipewire socket.

mentioned in merge request !283

Created !283 for this.

I would expect that an app not showing an indicator cannot play audio (like in web browsers).

As such, either pipewire (or whatever audio demon is in use) needs to be made aware that the app is playing audio, or - even better - pipewire should inform the compositor that an application is playing audio.

I don't like this protocol because it's just a flag that has no relation to reality and it's some best effort thing that relies on everybody cooperating and no bugs being around in the at least 3 processes that are in use (audio deamon, compositor and application).

I'd also expect microphone usage/camera usage to work in a similar way. No idea if that should go in the same protocol, but it feels to me like it should.

I would expect that an app not showing an indicator cannot play audio (like in web browsers).

Many people are using mpd or cmus to play music. These application don't have a GUI and know nothing about window systems and therefore there is no place to display such an indicator. If you want to limit access to audio devices, then pipewire or a portal might be the place to do this but this is out of scope for wayland.

Priot art on X11 seems to be the PA_PROP_WINDOW_ID on pulseaudio streams

FWIW, pipewire will most likely soon use a similar system to wayland's security-context protocol. That is, there will be a socket mounted in the flatpak sandbox that is associated with the app instance. Resources that the app instance can access will be visible on connections via the socket.

This is about associating them with a toplevel though. It's simply additional, untrusted information from apps. Would be nice to have. I just wonder which direction makes more sense. We could also use xdg-foreign to export a window handle and set that on the pw node.

Is there a link to any wip work or past discussions there? We have been pushing for pipewire namespacing for a while, it is super useful conceptually for VR, game streaming, where you can have virtual nodes so they can only see one mic and speakers from the headset or one virtual mic/speakers.

So far it seems none of those efforts have resulted in anything usable for us, so it would be nice to actually have this.

There is https://gitlab.freedesktop.org/wtaymans/pipewire/-/commits/security-context/?ref_type=heads but you should just talk to Wim. I'm also pushing for the dbus Containers1 interface to become a more generic namespacing thing (dbus/dbus!449 (comment 2275074)) but the people working on dbus have basically no time for feature work.

I also agree with @orowith2os that associating pw nodes with toplevels is the right thing to do. This would allow us to not only associate audio output, but also microphone, webcams, and streamcasts with specific toplevels instead of only having app-instance information.

For microphone, webcams and steamcast we already have portals which could carry a window id. For audio output we could add some API to pulseaudio and pw to do the same.

The nice thing here would be that once the association has been made, a compositor can ask pw for all objects and their state, no matter what client API was used to get the info there.

Audio indicator protocol

Child items 0

Activity

Admin message

Admin message

Audio indicator protocol

Activity