audiobasesink: discontinuity detect will result in drop audio frames
when playing some audio, audiobasesink will try to align the sample and wait discontinuity expired in gst_audio_base_sink_get_alignment. This result in some frames dropped between last buffer and current buffer.
the test pipeline is "filesrc location=/data/xxx.wma ! asfdemux ! avdec_wmapro ! audioconvert ! alsasink"
Here's a log snippet showing this happening (For a clearer analysis, I added some log, see at [ysq]). This log snippet print the first three buffers when frame drop occurs.
the first buffer render ok, render from 49151 to 51199
0:00:01.366447385 804 0x2cf010 DEBUG GST_SCHEDULING gstpad.c:4443:gst_pad_chain_data_unchecked:<alsasink0:sink> calling chainfunction &gst_base_sink_chain with buffer buffer: 0xb3e03690, pts 0:00:01.186879819, dts 99:99:99.999999999, dur 0:00:00.046439909, size 16384, offset none, offset_end none, flags 0x0
0:00:01.366477616 804 0x2cf010 DEBUG basesink gstbasesink.c:3826:gst_base_sink_chain_unlocked:<alsasink0> got times start: 0:00:01.186879819, end: 0:00:01.233319728
0:00:01.366506924 804 0x2cf010 DEBUG basesink gstbasesink.c:2166:gst_base_sink_get_sync_times:<alsasink0> got times start: 0:00:01.186879819, stop: 0:00:01.233319728, do_sync 0
0:00:01.366529539 804 0x2cf010 LOG basesink gstbasesink.c:2725:gst_base_sink_do_sync:<alsasink0> avg frame diff 0:00:00.046887371
0:00:01.366545385 804 0x2cf010 DEBUG basesink gstbasesink.c:3944:gst_base_sink_chain_unlocked:<alsasink0> rendering object 0xb3e03690
0:00:01.366573847 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1905:gst_audio_base_sink_render:<alsasink0> time 0:00:01.186879819, start 0:00:00.000000000, samples 2048
0:00:01.366615539 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1942:gst_audio_base_sink_render:<alsasink0> sync-offset +0:00:00.000000000, render-delay 0:00:00.000000000, ts-offset +0:00:00.000000000
0:00:01.366641077 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2008:gst_audio_base_sink_render:<alsasink0> running: start 0:00:01.186879819 - stop 0:00:01.233319728
0:00:01.366667385 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2050:gst_audio_base_sink_render:<alsasink0> final timestamps: start 0:00:01.186879819 - stop 0:00:01.233319728
0:00:01.366703770 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2081:gst_audio_base_sink_render:<alsasink0> [ysq] render_start 52341, render_stop 54389 //caculate timestamp in samples, 52341-50293=2048 frames
0:00:01.366720154 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1745:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] sink->next_sample 49151, sample_offset 52341, sample_diff 3190 //sink->next_sample is last render ended position
0:00:01.366733770 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1752:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] max_sample_diff 1764
0:00:01.275152846 1033 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1765:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] time(caculate sample_offset in timestamp)0:00:01.186870748,priv->discont_time 0:00:00.277981859,priv->discont_wait 0:00:01.000000000
0:00:01.275173231 1033 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1793:gst_audio_base_sink_get_alignment:<alsasink0> [ysq]diff(caculate sample_diff in timestamp) +0:00:00.072335600
0:00:01.366765462 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1786:gst_audio_base_sink_get_alignment:<alsasink0> align with prev sample, ABS (-3190) < 1764
0:00:01.366781001 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2142:gst_audio_base_sink_render:<alsasink0> [ysq] render_start 49151, render_stop 51199, align -3190
0:00:01.366796462 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2154:gst_audio_base_sink_render:<alsasink0> [ysq]rendering at 49151 2048/2048
this second buffer render ok, render from 51199 to 53247
0:00:01.416471770 804 0x2cf010 DEBUG GST_SCHEDULING gstpad.c:4443:gst_pad_chain_data_unchecked:<alsasink0:sink> calling chainfunction &gst_base_sink_chain with buffer buffer: 0x2cef38, pts 0:00:01.233319728, dts 99:99:99.999999999, dur 0:00:00.046439909, size 16384, offset none, offset_end none, flags 0x0
0:00:01.416500693 804 0x2cf010 DEBUG basesink gstbasesink.c:3826:gst_base_sink_chain_unlocked:<alsasink0> got times start: 0:00:01.233319728, end: 0:00:01.279759637
0:00:01.416528847 804 0x2cf010 DEBUG basesink gstbasesink.c:2166:gst_base_sink_get_sync_times:<alsasink0> got times start: 0:00:01.233319728, stop: 0:00:01.279759637, do_sync 0
0:00:01.416551693 804 0x2cf010 LOG basesink gstbasesink.c:2725:gst_base_sink_do_sync:<alsasink0> avg frame diff 0:00:00.046831438
0:00:01.416567616 804 0x2cf010 DEBUG basesink gstbasesink.c:3944:gst_base_sink_chain_unlocked:<alsasink0> rendering object 0x2cef38
0:00:01.416595616 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1905:gst_audio_base_sink_render:<alsasink0> time 0:00:01.233319728, start 0:00:00.000000000, samples 2048
0:00:01.416627616 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1942:gst_audio_base_sink_render:<alsasink0> sync-offset +0:00:00.000000000, render-delay 0:00:00.000000000, ts-offset +0:00:00.000000000
0:00:01.416651847 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2008:gst_audio_base_sink_render:<alsasink0> running: start 0:00:01.233319728 - stop 0:00:01.279759637
0:00:01.416677693 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2050:gst_audio_base_sink_render:<alsasink0> final timestamps: start 0:00:01.233319728 - stop 0:00:01.279759637
0:00:01.416727924 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2081:gst_audio_base_sink_render:<alsasink0> [ysq] render_start 54389, render_stop 56437
0:00:01.416745001 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1745:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] sink->next_sample 51199, sample_offset 54389, sample_diff 3190
0:00:01.416758462 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1752:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] max_sample_diff 1764
0:00:01.325247308 1033 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1765:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] time(caculate sample_offset in timestamp)0:00:01.233310657,priv->discont_time 0:00:00.277981859,priv->discont_wait 0:00:01.000000000
0:00:01.325268616 1033 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1793:gst_audio_base_sink_get_alignment:<alsasink0> [ysq]diff(caculate sample_diff in timestamp) +0:00:00.072335600
0:00:01.416789847 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1786:gst_audio_base_sink_get_alignment:<alsasink0> align with prev sample, ABS (-3190) < 1764
0:00:01.416805385 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2142:gst_audio_base_sink_render:<alsasink0> [ysq] render_start 51199, render_stop 53247, align -3190
0:00:01.416821539 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2154:gst_audio_base_sink_render:<alsasink0> [ysq]rendering at 51199 2048/2048
the third buffer render NG, frame dropped between 53247 and 56437, droped 56437-53247=3190 frames, we can hear carton
0:00:01.457781308 804 0x2cf010 DEBUG GST_SCHEDULING gstpad.c:4443:gst_pad_chain_data_unchecked:<alsasink0:sink> calling chainfunction &gst_base_sink_chain with buffer buffer: 0xb3e6e800, pts 0:00:01.279759637, dts 99:99:99.999999999, dur 0:00:00.046439909, size 16384, offset none, offset_end none, flags 0x0
0:00:01.457861847 804 0x2cf010 DEBUG basesink gstbasesink.c:3826:gst_base_sink_chain_unlocked:<alsasink0> got times start: 0:00:01.279759637, end: 0:00:01.326199546
0:00:01.457939077 804 0x2cf010 DEBUG basesink gstbasesink.c:2166:gst_base_sink_get_sync_times:<alsasink0> got times start: 0:00:01.279759637, stop: 0:00:01.326199546, do_sync 0
0:00:01.457998847 804 0x2cf010 LOG basesink gstbasesink.c:2725:gst_base_sink_do_sync:<alsasink0> avg frame diff 0:00:00.046782496
0:00:01.458042539 804 0x2cf010 DEBUG basesink gstbasesink.c:3944:gst_base_sink_chain_unlocked:<alsasink0> rendering object 0xb3e6e800
0:00:01.458118539 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1905:gst_audio_base_sink_render:<alsasink0> time 0:00:01.279759637, start 0:00:00.000000000, samples 2048
0:00:01.458205616 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1942:gst_audio_base_sink_render:<alsasink0> sync-offset +0:00:00.000000000, render-delay 0:00:00.000000000, ts-offset +0:00:00.000000000
0:00:01.458274924 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2008:gst_audio_base_sink_render:<alsasink0> running: start 0:00:01.279759637 - stop 0:00:01.326199546
0:00:01.458345308 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2050:gst_audio_base_sink_render:<alsasink0> final timestamps: start 0:00:01.279759637 - stop 0:00:01.326199546
0:00:01.458476385 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2081:gst_audio_base_sink_render:<alsasink0> [ysq] render_start 56437, render_stop 58485, rate 4607182418800017408
0:00:01.458525924 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1745:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] sink->next_sample 53247, sample_offset 56437, sample_diff 3190
0:00:01.458565616 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1752:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] max_sample_diff 1764
0:00:01.366432539 1033 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1765:gst_audio_base_sink_get_alignment:<alsasink0> [ysq] time(caculate sample_offset in timestamp)0:00:01.279750566,priv->discont_time 0:00:00.277981859,priv->discont_wait 0:00:01.000000000
0:00:01.458646001 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1772:gst_audio_base_sink_get_alignment: [ysq] set discont flag is TRUE
0:00:01.366463154 1033 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:1793:gst_audio_base_sink_get_alignment:<alsasink0> [ysq]diff(caculate sample_diff in timestamp) +0:00:00.072335600
0:00:01.458703001 804 0x2cf010 WARN audiobasesink gstaudiobasesink.c:1797:gst_audio_base_sink_get_alignment:<alsasink0> Unexpected discontinuity in audio timestamps of +0:00:00.072335600, resyncing
0:00:01.458750924 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2142:gst_audio_base_sink_render:<alsasink0> [ysq] render_start 56437, render_stop 58485, align 0 //here, it is should rendered at 53247, but render at 56437
0:00:01.458797231 804 0x2cf010 DEBUG audiobasesink gstaudiobasesink.c:2154:gst_audio_base_sink_render:<alsasink0> [ysq]rendering at 56437 2048/2048
the third buffer's pts,sample_diff,and render position are all normal, I am a bit confused why the discont flag should be set true here? if do like this, there may be a cartoon when playing many audio files. Internally, our solution is not set discont flag to avoid drop frames. Any thoughts about this audio frame drop?