QtDemux not parsing AAC sample rate from esds atom box
Hello GStreamer team!
I'm trying to play a DASH stream with AAC audio (Or I should be more specific and say this is HE-AAC codec). It seems that qtdemux is setting the rate caps to 48,000 while the stream is actualy has 24,000 sample rate. Because of that, aacparse is prepending the AAC ADTS header with sample rate 48,000 and as a result, the decoder I'm using (In this case I'm forwarding the frames via RTP to VLC) is not playing anything. Overriding the ADTS header with 24,000 sample rate fix the problem and VLC plays it fine.
After investigation of qtdemux, it's getting the sample_rate from stsd atom in mp4 (Which is set wrongly to 48,000), but there is a piece of code which should attempt to get and override the sample rate from the mp4 container from the esds atom, 'DecoderSpecificInfo' which I understand is audioSpecificConfig or something similar. The piece of code in qtdemux.c which does this override in gst_qtdemux_handle_esds() has an interesting comment:
/* Override channels and rate based on the codec_data, as it's often
* wrong. */
/* Only do so for basic setup without HE-AAC extension */
if (data_ptr && data_len == 2) {
guint channels, rate;
channels = gst_codec_utils_aac_get_channels (data_ptr, data_len);
mp4 dump shows DecoderSpecificInfo = 13 10 56 e5 98. first 5 bits are 00010, so audio object type is 2 next 4 bits are 0100, so audio frequency index is 6, which mapped to 24,000 sample rate. Since the length of the decoder specific config is greater than 2, this override is not being done.
I changed to data_len >= 2 and now the sample rate is being overridden to 24,000 properly.
So I want to say there a bug that data_len == 2 is being checked instead of data_len >= 2, but because of the above code comment about not doing that to HE-AAC, I want to understand why, and whether this is the reason data_len is being compared to 2 (The rest of the bytes seems to be HE-AAC extension indicating SBR/PS present. So, why this override is not being handled for HE-AAC? The specific function which extracts the sample rate and channels seems to also explicitly handle the extension by checking for audio object type 5 or 29 (SBS/PS).
Forgive me but I cannot share the dash asset I'm using, I hope mp4 dump is enough Also, I'm using a rather old gstreamer, 1.18.something, but the above piece of code still exist in latest version.
mp4dump for and mp4info outputs:
[ftyp] size=8+16
major_brand = iso6
minor_version = 0
compatible_brand = iso6
compatible_brand = dash
[free] size=8+32
[moov] size=8+568
[mvhd] size=12+96
timescale = 1
duration = 0
duration(ms) = 0
[trak] size=8+412
[tkhd] size=12+80, flags=7
enabled = 1
id = 1
duration = 0
width = 0.000000
height = 0.000000
[mdia] size=8+312
[mdhd] size=12+20
timescale = 24000
duration = 0
duration(ms) = 0
language = eng
[hdlr] size=12+38
handler_type = soun
handler_name = USP Sound Handler
[minf] size=8+222
[smhd] size=12+4
balance = 0
[dinf] size=8+28
[dref] size=12+16
[url ] size=12+0, flags=1
location = [local to file]
[stbl] size=8+162
[stsd] size=12+82
entry_count = 1
[mp4a] size=8+70
data_reference_index = 1
channel_count = 2
sample_size = 16
sample_rate = 48000
[esds] size=12+30
[ESDescriptor] size=2+28
es_id = 1
stream_priority = 0
[DecoderConfig] size=2+20
stream_type = 5
object_type = 64
up_stream = 0
buffer_size = 819
max_bitrate = 101243
avg_bitrate = 96000
DecoderSpecificInfo = 13 10 56 e5 98
[Descriptor:06] size=2+1
[stts] size=12+4
entry_count = 0
[stsc] size=12+4
entry_count = 0
[stsz] size=12+8
sample_size = 0
sample_count = 0
[stco] size=12+4
entry_count = 0
[mvex] size=8+32
[trex] size=12+20
track id = 1
default sample description index = 1
default sample duration = 0
default sample size = 0
default sample flags = 0
File:
major brand: iso6
minor version: 0
compatible brand: iso6
compatible brand: dash
fast start: yes
Movie:
duration: 0 (media timescale units)
duration: 0 (ms)
time scale: 1
fragments: yes
Found 1 Tracks
Track 1:
flags: 7 ENABLED IN-MOVIE IN-PREVIEW
id: 1
type: Audio
duration: 0 ms
language: eng
media:
sample count: 0
timescale: 24000
duration: 0 (media timescale units)
duration: 0 (ms)
bitrate (computed): 0.000 Kbps
sample count with fragments: 0
duration with fragments: 0
duration with fragments: 0 (ms)
Sample Description 0
Coding: mp4a (MPEG-4 Audio)
Codec String: mp4a.40.5
Stream Type: Audio
Object Type: MPEG-4 Audio
Max Bitrate: 101243
Avg Bitrate: 96000
Buffer Size: 819
MPEG-4 Audio Object Type: 2 (AAC Low Complexity)
MPEG-4 Audio Decoder Config:
Sampling Frequency: 24000
Channels: 2
Extension:
Object Type: Spectral Band Replication
SBR Present: yes
PS Present: no
Sampling Frequency: 48000
Sample Rate: 48000
Sample Size: 16
Channels: 2