Add support for DSD to gstaudio, gst-libav, and alsasink (!3901) · Merge requests · GStreamer / gstreamer

Carlos Rafael Giani requested to merge dv1/gstreamer:dsd-integration-2 into main Feb 06, 2023

This introduces new DSD structures, constants, functions, and elements, and integrates those into alsasink, audioringbuffer, audiosink, audiobasesink, and gst-libav. It is based on rtiemann 's previous DSD work. Thanks for that, @rtiemann!

A new audio/x-dsd media type is established, as well as DSD caps:

      audio/x-dsd
                 format: { (string)DSDU32BE, (string)DSDU16BE, (string)DSDU8, (string)DSDU32LE, (string)DSDU16LE }
                   rate: [ 1, 2147483647 ]
                 layout: { (string)interleaved, (string)non-interleaved }
         reversed-bytes: { (boolean)false, (boolean)true }
               channels: [ 1, 2147483647 ]

In DSD, there aren't really any "sample format". DSD has only one "sample format", and that is the bit. But since working with individual bits is impractical in software, they are grouped into words. That's what DSDU8, DSDU32LE etc. mean here - I prefer to call them "grouping formats", since they essentially "group" DSD bits together. (The LE variants store the bytes in a little-endian, the BE in a big-endian manner, as usual.) In stereo and multichannel data, interleaved DSD interleaves the channels on a per-word level. Non-interleaved data stores the channel words separate from each other; first, all words from channel 1 are stored, then all words from channel 2 etc. This is just like in PCM.

DSD has two peculiarities:

There is the notion of "reversed bytes". As said, DSD actually deals with a bit stream. These bits are grouped into words. If for example the grouping format is DSDU8, then the most significant bit of the first byte in the stream stores the very first bit (bit nr. 0). The second most significant bit stores bit nr. 1 etc. and bit nr. 7 is the least significant bit of the first byte. Bit nr. 8 is then the most significant bit of the second byte etc. If bytes are reversed, this goes the other way round - bit nr. 0 is stored as the least significant bit of the first byte, bit nr. 7 as the most significant bit of the first byte, bit nr. 8 as the least significant bit of the first byte and so on. This is not to be confused with the byte-level endianness mentioned above - that one is completely separate.
Since the grouping formats do group a different amount of bits (DSDU32LE groups 32 bit, DSDU8 groups 8 etc.), and the bits are what's actually being played, it means that different formats contain different playtimes. For example, the bits in one DSDU32LE word cover 4x as much playtime as one DSDU8 word does. This is entirely different to PCM; there, no matter if the PCM format is S16LE, S8, F32LE etc. the playtime of one PCM word is always 1/samplerate. And this in turn means that calculations that use the BPF for figuring out durations cannot be used with DSD. For this reason, the GstDsdInfo structure does not have "BPF", it has instead a "stride". This fundamental difference between PCM and DSD is one of the main reasons why I chose not to add DSD to GstAudioInfo and friends, and instead establish entirely new structures.

A dsdconvert element is added to convert between these various ways of organizing DSD data. (Channel count and rates are not changed.)

The actual DSD conversion is also available as part of the gstaudio library.

The following pipelines were tested (hw:3,0 is a USB DAC capable of DSD playback):

gst-launch-1.0 filesrc location=test-dsd-128.dsf ! avdemux_dsf ! dsdconvert ! alsasink device=hw:3,0
gst-launch-1.0 filesrc location=test-dsd-64.dff ! avdemux_iff ! dsdconvert ! alsasink device=hw:3,0
gst-launch-1.0 filesrc location=test-dsd-128.dsf ! avdemux_dsf ! dsdconvert ! avdec_dsd_msbf ! autoaudiosink
gst-launch-1.0 filesrc location=test-dsd-64.dff ! avdemux_iff ! dsdconvert ! avdec_dsd_msbf ! autoaudiosink

Autoplugging also works. This plays for example:

gst-play-1.0 test-dsd-128.dsf

The Graphviz dot dump shows that avdemux_dsf and avdec_dsd_lsbf_planar are auto-plugged. Playback works normally.

Another example:

gst-play-1.0 --audiosink="dsdconvert ! alsasink device=hw:3,0" test-dsd-128.dsf

This plays DSD data directly to the DSD-capable DAC. audioconvert converts the planar DSD data from the file to interleaved DSD data, and reverses the bytes (since that's how they are stored in this particular DSF file).

DSF and DFF (a.k.a. "DSDIFF") are the main two DSD container formats. DSF stores DSD audio in a planar, DFF in an interleaved fashion. gst-libav provides the following four DSD to PCM converters/decoders:

avdec_dsd_lsbf
avdec_dsd_lsbf_planar
avdec_dsd_msbf
avdec_dsd_msbf_planar

The ones without the "_planar" suffix decode interleaved DSD. lsbf means "least significant bit first" (not byte - bit). The "lsbf" ones are for data with the "reversed-bytes" caps set to true.

Automatic DSD conversion should ideally work via autoconvert. If autoconvert became a part of playsink, it could automatically insert dsdconvert if needed. Currently, this is the favored approach. As a plan B, ideas to insert DSD conversion into alsasink / audioringbuffer / audiobasesink can be revisited.

Edited Mar 14, 2023 by Carlos Rafael Giani

Admin message

Add support for DSD to gstaudio, gst-libav, and alsasink

Merge request reports