Is PipeWire ready yet?
It is getting ready for broader testing.
The API in master is now declared stable and not expected to change anymore for the 0.3 release.
The protocol can support older 0.2 version clients transparently. This means that flatpaks with older PipeWire libraries can connect to a newer daemon.
Where is PipeWire in the stack?
PipeWire sits right on top of the kernel drivers (or as close as possible). You can think of it as a multimedia routing layer on top of the drivers that applications and libraries can use.
Don't Pro-audio and Consumer audio have conflicting requirements
Pro-audio needs low and reliable latency with minimal audio over/underruns. Power usage is of little concern. Pro-audio requires flexible user-configurable routing of the signals.
Consumer Audio focuses on low power usage, latency (in the case of playback) is of no concern. Consumer audio wants things to just work with minimal configuration.
Where JACK and PulseAudio where explicitly (exclusively) tuned for their respective use cases, PipeWire takes a hybrid approach. PipeWire uses the scheduling and graph model of JACK but mainly uses timer-based wakeups like PulseAudio. This makes it possible to dynamically switch between small buffers with low-latency/high power usage and large buffers with high-latency/low power usage. It adapts based on the latency requirements of the application in a glitch-free way. There are limits to this, PipeWire can only increase buffer sizes to 8192 samples (+-180ms) but coupled with much more simple code-paths this should be good enough for consumer use.
PipeWire also supports dynamic add/remove of devices with automatic clock slaving. It handles bluetooth devices or any other node that can be written as a plugin.
PipeWire mainly uses pro-audio formats (floating point samples) as the canonical data-format between nodes in the graph. It is also possible to negotiate other formats to support compressed formats.
The part of PulseAudio that manages the policy is implemented in a separate session manager that can be adapted and configured according to the consumer use case.
Is PipeWire just another GStreamer?
PipeWire is architecturally significantly different from GStreamer and is designed more like JACK. Differences include:
- The processing graphs are processed in a much more controlled fashion. This allows us to achieve much lower and more predictable latencies. All nodes in the graph are woken up from source to sink when the device wakes up for input/output. Data is processed in fixed sized chunks.
- lockfree processing
- more localized and lighter format negotiation. The Negotiation and format description is borrowed from GStreamer.
- no dynamic buffer allocation while processing. All buffers and metadata are allocated before processing begins.
GStreamer is intended to be a swiss army knife of multimedia, PipeWire is meant to be much lower level, more like what alsa-lib, JACK or libv4l2 provides.
Is PipeWire another JACK implementation?
PipeWire has a very similar processing model as JACK but adds the following features compared to jack:
- Extensible communication protocol that allows new interfaces on objects to be added in the future.
- Arbitrary formats can be negotiated between nodes. This allows us to handle video as well as compressed formats. This is important for sending compressed formats to the device (AC3 over HDMI or AAC over bluetooth, for example).
- Negotiation of buffers. A pool of buffers can be negotiated between instances and the memory is exchanged with fd passing. This makes it possible to share hardware surfaces and make video possible.
- Dynamic sinks and sources. Devices can be hotplugged. There is automatic slaving between devices similar to what a2j does when graphs are joined.
- Dynamic latency, it adapts the buffer period to the lowest requested latency. Smaller buffer sizes use more CPU but larger buffer sizes have more latency.
- Synchronous clients are providing data for the current processing cycle of the device(s). There is no extra period of latency.
- Dynamic device suspend and resume. Unused devices are closed to save CPU.
- Implemented with sandboxing in mind.
- Some of the limitations of JACK are fixed. PipeWire has something similar to the JACK transport that also supports looping, trick modes and lookahead of the scheduled timeline.
- PipeWire has a more generic control type that can be used to implement Midi and OSC natively. Midi similar to a2jmidid is built in.
Does PipeWire replace ALSA?
No, ALSA is an essential part of the Linux audio stack, it provides the interface to the kernel audio drivers.
That said, the ALSA user space library has a lot of stuff in it that is probably not desirable anymore these days, like effects plugins, mixing, routing, slaving etc..
PipeWire uses a small subset of the core alsa functionality to access the hardware (It should run with tinyalsa, for example). All of the other features should be handled by PipeWire.
Will PipeWire ever be as good as JACK?
Unlikely, for some definitions of good.. there are some things that JACK can optimize for, like:
- It can configure the alsa device with 2 periods with fixed small size. With the current ALSA driver implementations this can result in lower latencies than can be achieved with using a timer based mechanism (according to my experiments). Theoretically there should not be a difference but we are not there yet.
- It does not need to care about security and can simply allocate all objects in one fixed piece of shared memory, this makes it much faster to get to the data you need and to introspect objects.
- it does not need to care about negotiation of data formats or buffers, which makes it faster to build the graph and start processing.
- It has a lot of support and history
- We might not want to support freewheeling or other jack features..
Are you using a push or pull model for scheduling
PipeWire uses pull model. This means that the device wakes up at the last possible moment to pull in more data from all the nodes in the graph. This allows for the lowest possible latency between producing the data and consuming it.
This is in contrast to GStreamer that mostly uses the push model. In this model, data is produced independently of the device and is then queued in the device or queues in front of the device (in case of video playback).
Isn't format negotiation bad for pro audio?
Yes. Format conversions are not cheap and must be avoided. For audio processing in PipeWire we have the following rules:
- Filters and real-time clients must use float 32 mono audio. The audio processing graph is only operating in this format.
- Format conversions are done at the input/output nodes. This means that conversions are done to and from devices and also to and from clients that use the stream API
- This also means that the conversion code for clients runs in the context of the client and not the server. This also avoids issues with having complicated code such as decoders running in the server context.
What about pro video?
- Similar to audio we have one common format: RGBA float32 premultiplied linear video. This should be easy to generate and manipulate on GPU/CPU and allow for HDR and simple compositing operations.
- a splitter/converter for video devices. We need to convert the v4l2 buffers to the common format so that the filters can work on them. Likewise we need converters inside the server side stream API to send/receive video in other formats.
What kind of API will there be to interface with PipeWire
A lowlevel API that allows you to create a node that you can then add to a local or remote processing graph. This API gives you full control over format and buffer negotiation, supports multiple input and outputs as well as controls, commands, events and parameters. The node will be part of the real-time processing graph and provides data for the current processing cycle of the graph.
There is a filter API that can be used to make audio and video filters. It can have multiple input and output ports as well as parameters. It is like the JACK client API but more powerful.
The most used API will be the stream API. The idea is to create a stream object that allows you to play or record 1 stream from the server. You then receive callbacks when a buffer needs to be provided for playback or when a buffer is available for capture. The stream API has a client side component that will do format and buffer size conversions when requested. The stream API has simple controls for audio volume and video colorbalance. The stream API can work synchronous and asynchronous. It is like the pulseaudio API but more powerful.
There are also replacement libraries that run the JACK and PulseAudio API on top of PipeWire.
What audio API do you recommend to use
The situation is a bit like GUI toolkits. There are many, each with different use cases. Nobody uses the native display server protocols directly (X11, Wayland) but always through an abstraction layer (GTK, Qt, ...).
We recommend that you continue to use PulseAudio, JACK and ALSA apis for now.
What is wrong with JACK + pulseaudio
PulseAudio has a JACK backend that sends all the mixed streams to JACK. It however has some problems:
- Smaller JACK period sizes wake up pulseaudio a lot, causing it to use massive amounts of CPU.
- Suspend of the JACK device is not implemented/possible
- Passthrough on the JACK device is not possible
- Individual streams in pulseaudio are not managed inside JACK
Why not just improve JACK instead
- JACK has no support for negotiating formats or buffers. This makes it hard to implement anything like exclusive access to devices or more complicated buffer memory. PipeWire attempts to keep the same goals as JACK but with adding format and buffer negotiation.
- The JACK API has no support for fd backed memory. For video it is important to leave the pixels on the GPU instead of touching it with the CPU. It's not clear how this can be added nicely. One option would be to embed more data into the port buffers. With an extension to the protocol we could place a data structure in a local buffer with the video fd.
- Current JACK implementations do not care about security of sandboxed clients.
Why not just improve PulseAudio instead
- The pulseaudio design does not allow for video buffers
- Pulseaudio design is not suited for the kind of low-latency we target. There is too much logic and context switches between the client and device.
How is PipeWire supposed to be a better PulseAudio
- PipeWire can achieve lower latency with much less CPU usage and dropouts compared to PulseAudio. This would greatly improve video conferencing apps, like WebRTC in the browser.
- PipeWire's security model can stop applications from snooping on eachother's audio.
- PipeWire allows more control over how applications are linked to devices and filters.
- PipeWire uses an external policy manager that can provide better integration with the rest of the desktop system and configuration.
How is PipeWire supposed to be a better JACK
- PipeWire is more dynamic by design. It can expose all devices and does similar things that zita-a2j/j2a can provide. The implementation of merging the devices and doing resampling is also a lot more efficient than what zita-a2j can provide.
- Multiple devices don't need to be resampled to a common clock when they are not in any way linked to eachother.
- It handles bluetooth devices or any device for which a plugin can be made.
- PipeWire can adapt the latency dynamically, which is important for power usage on a laptop. When low latency is required, the system can switch automatically and seamlessly to smaller buffer sizes.
- PipeWire allows arbitrary formats, which makes it possible to implement exclusive access to devices, passthrough and more. This is important if you ant to send raw DTS to your amplifier or AAC to your bluetooth headphones, potentially improving audio quality and preserving power.
- PipeWire will implement full latency compensation. This is not available in JACK and it would be hard to implement efficiently.
How is PipeWire going to avoid xruns?
- All the regular system tuning you might do to avoid or reduce xruns still apply, for now.
- PipeWire uses a thread with real-time priority, eventfd and epoll in the data processing path. This does not fix avoid the underrun/overrun problems but using simple primitives allows the processing pipeline to run on a real-time kernel subsystem like EVL (https://evlproject.org). This might be to solution in the long term to avoid xruns.
How is PipeWire going to handle latency
- The plan is to implement full latency compensation in PipeWire. This means that streams will be sample accurately aligned even when signals go through different paths with different latencies. Because of how PipeWire allocates memory, this can be done quite efficiently by changing offsets in the sample buffers.