Skip to content

WIP: Gapless playback support for mp3 audio

This adds gapless playback support for mp3. (In theory, it could also be used for mp1 and mp2, but such files with LAME tags most likely don't exist, and these two filetypes are pretty much extinct by now.)

It works by (a) adjusting PTS/DTS and durations in mpegaudioparse, (b) adding audioclipping meta in mpegaudioparse, and (c) clipping samples according to that meta in mpg123.

For me, it works stable. I tested it with several gapless mp3 albums as well as gapless test data. It can also handle "Frankenstein" streams, which are poorly stitched together streams. Example: cat 1.mp3 2.mp3 > joined.mp3. Since in such a stiched-together stream one of the parts (for example 1.mp3) can contain a LAME tag, it is important to be aware of this & have support for it so that this does not break. There are web radios who deliver such "Frankenstein" streams, so this is not just an academic exercise (and this is also why VLC, mpv, ffmpeg have such support as well).

Here are some unclear bits / leftover TODOs that need some discussion before the merge can proceed:

  1. mpegaudioparse: It needs to be clarified if position, segment, bitrate, convert queries also have to be adjusted.
  2. mpegaudioparse: Seeking. It does work, but it is currently unclear if it the accuracy is maybe diminished a bit due to the padding samples at the beginning that get removed. Meaning that when you seek to position 1000 you may actually seek to position 700 etc. This may currently be masked by the fact that mpegaudioparse does not pass the Xing/VBRI tables to baseparse.
  3. Some code duplication going on ( gst_mpeg_audio_parse_check_if_is_xing_header_frame () ).
  4. The frame_nr = GST_BUFFER_PTS (frame->buffer) / mp3parse->frame_duration; line for determining in what buffer the dataflow is currently in. This works, but does not seem to be clean. But I cannot find a better solution.
  5. I do not know if in addition to buffer PTS/DTS/duration, buffer offset/offset-end also should be adjusted. I tend towards no, since the offsets do specify the byte position in the encoded MPEG stream.
  6. I raised the rank of mpg123audiodec to primary. Its performance is comparable to that of avdec_mp3. But while I was able to adapt the former to make use of the audio clipping meta, I found no way to do the same for the latter, because the latter is autogenerated. EDIT: avdec_mp3 can actually work with this too, with this change: gst-libav!41 (closed)
  7. With these changes, mpegaudioparse will always adjust PTS/DTS/durations of buffers - even if a downstream mp3 decoder does not process the audio clipping meta. This then leads to a mismatch between duration of a buffer and actual number of samples in a buffer (-> duration can be shorter than actual content). I am unsure how to deal with this. Require that all mp3 decoders handle this meta? Implement some sort of query to determine if downstream decoders can handle this?
Edited by Tim-Philipp Müller

Merge request reports