Audio in PipeWire
Devices are the physical hardware cards. Cards can operate in different profiles. Setting a profile on the device creates a number of sink and source nodes to perform the data processing.
The session manager usually detects devices and creates Device objects in the PipeWire daemon for the ones it wants to expose.
These are physical playback and recording points for audio. Nodes are woken up by a timer when data needs to be written or read. The timer is set dynamically based on the amount of data in the device and desired latency. Some nodes are part of the same device and are woken up together.
Nodes are usually created by the session manager based on the selected profile in the device.
Nodes are usually wrapped in an adapter object that can do conversions (sample format, samplerate, channels conversion and mixing as well as volume/mute control). An adapter can also dynamically adjust the samplerate to match a reference.
Nodes with an adapter are configured by setting an internal format and a port configuration. The internal format defines the source/target format of the internal node and the port config defines how this format is converted on the outputs/inputs. You can, for example, configure the internal format as S16 stereo samples at 44.1KHz and the port config as a 5.1 DSP configuration (one port per channel,
A node (and adapter) can live entirely in the server or on the client and can be scheduled independently without needing a server context switch.
When a client connects to PipeWire it needs to provide session information on how it would like to interact with the system. This includes if it wants to do playback or capture and the type of media (video playback, VOIP, pro audio…). It also needs to list its preferred latency and sample rate.
This information is used to filter and select the set of parameters (latency, samplerate, buffers…) as well as a list and order of devices to use.
Clients can ask for exclusive access to the devices. Depending on the requirements of other clients, this can be granted or not.
Each node has a graph. The node is the a 'driver' of the graph. This means it starts the processing of all the other nodes in the graph.
When a node is linked to another node, their graphs are joined. The node is then scheduled by the driver of the joined graph (if any).
When a graph with a driver is joined with another graph with a driver, one of the driver devices is selected as the primary. That node becomes the driver for the combined graph.
PipeWire supports arbitrary formats for the audio data. For practical purposes, and interaction with existing jack clients, we limit the formats of the audio processing graph to:
- 32-bit floating-point (
float32) mono audio
When a client has exclusive access to a device, we permit them to use any format/buffer size supported by the device.
Buffers between nodes in the audio graph are allocated with a size that can hold the configurable maximum latency of samples (±20ms or 1024 samples, typically).
On each wakeup of the graph by the driver, the graph is run with a fixed buffer size, smaller than the maximum latency. The buffer size is determined by the minimum required latency of all connected clients.
This allows us to change the buffer size dynamically based on the use case of the clients. This is essential to support both pro-audio and desktop audio clients with the same audio server.
Nodes without any dependencies are woken up first. When they finish processing they atomically decrement the counters of the peers that depend on this node. When a peer has no more dependencies it is woken up. This can happen without any interaction with the PipeWire daemon.
A client can make a node in the graph and exchange data with connected peer nodes. This requires that the client provides data acceptable for the peers. For audio filters this would mean
float32 mono format. This is not unlike jack clients.
A more generic playback/record API is developed in the
pw_stream API. It works well for a single audio/video input or output stream and provides a basic queue/dequeue API. The client can choose a maximum latency and an arbitrary format. The client has internal converters to adapt between the client and server format and buffer size.
This avoid running (expensive) conversion code in the server and simplifying things because the server only needs to deal with one format.
Graph changes in general become active in the next iteration of the graph. We will not attempt to rewind the graph to get this latency lower than the current buffer size of the iteration. This means new clients or volume changes are applied with at most (the configurable) maximum latency of the graph (±20ms by default).
Session management involves keeping track of devices and their priorities. It also involves keeping track of applications and what devices they connect to. It also involves defining a policy for controls such a volume and hooking them into the streams and devices. All of this requires quite a lot of policy that we would like to manage outside of PipeWire.
The current plan is to make it possible to control all this from outside PipeWire. GNOME has interest in implementing this policy in
gnome-session-daemon. This would also make it possible to have better integration between
gnome-control-center and the audio session for things like the equalizer or other effect processing plugins.
We would probably either have some reference module or example application that implements a fallback policy to be used by other desktops.