Sound support

Terminals have extremely limited sound playing capabilities – the BEL character is the only one.

It would be convenient to add support for playing sounds, using some escape sequence. The emitter wouldn't have to care about OS specific interfaces such as ALSA, PulseAudio etc. It would even automatically work across ssh.

In the spirit of existing OSC 8 hyperlink support and in-design OSC 9/99 notification, I recommend to go with a similar OSC 440. The choice of the number is the frequency of the standard A note, a number commonly associated with music. (Number 12, the number of notes is already taken. Another choice could be 88, the number of keys on a piano.)

The syntax would basically be

OSC 440 ; params ; base64_encoded_payload_chunk ST

whereas params is a colon-separated list of key=values.

The emitter should basically rely on its own timer, e.g. first send out the first 1 second of the music, and then after every 0.1 second send another 0.1 second, or so. This way the buffer keeps at least ~1 second of music, to count for drifting, uneven scheduling of processes, network lags etc. There could be a new environment variable as a convention for the minimum buffer level, set to a higher value when sshing.

The key id would be a mandatory one, identifying the source of the sound. This is because if multiple sources are playing music, the terminal emulator has to place the chunks coming from the same source one after the other, but if they're coming from different sources then they need to be played concurrently. Even a single application might define multiple sources, e.g. one for a continuously playing background music (where a certain delaly is not a problem), and another for playing some additional sound (e.g. the response to a user action) as soon as possible.

There needs to be a way to query the current buffer level, so that the emitter can make timing corrections in the long run, especially across ssh when the two clocks might drift away. Also, the terminal emulator should send a sequence whenever the buffer is emptied, so that the emitter is explicitly notified of this situation. Another feature should be to ask to drop the buffers ASAP, e.g. when an application quits. Yet another feature should be to denote at the start that the entire sound to play has to be waited for, and denote at the end that the stream terminates. This is for short sounds to be played ASAP, but without interruption even if the data doesn't fit in a single escape sequence and the connection stalls. These can all be achieved using parameters to the escape sequences, the exact details are to be figured out.

This design is easily tmux-able, tmux wouldn't have to do any actual sound mixing work. It just needs to prefix the id with the pane's identifier, to avoid accidental clashes, and other than that, just forward the output escape sequences to the actual terminal. This prefix also allows tmux to dispatch back the response to the proper pane, without having to keep track of the ids seen.

Terminal emulators have really limited graphical capabilities. Apps don't have control over individual pixels, they just specify the colors and letters to display. In a similar manner, I don't find it important to be able to play MP3, OGG etc. formats. I believe that in order to match the audible experience with the visual one, and to save on network traffic, sounds support should be limited to MIDI.

Admin message

Sound support