Typefind misidentifying wav files containing specific byte sequence
Running typefind on the following two identical files (with only a different filename) results in a different outcome:
$ gst-typefind-1.0 test.wav
test.wav - audio/x-wav
$ gst-typefind-1.0 test.audio
test.audio - application/dash+xml
$ diff -s test.wav test.audio
Files test.wav and test.audio are identical
Expected behavior:
Typefind should report audio/x-wav
on both files.
Context: This is a problem we're seeing processing certain audio files. If the audio file contains byte sequence <mpd
(and some trailing characters), it's is identified as application/dash+xml. Looking at typefind implementation, a similar thing might happen for other XML-like sequences. The attached file was hand-crafted to show the issue, but we've seen this happening with real data. As a result of the bad typefind, the file is unplayable:
$ gst-launch-1.0 -q playbin uri=file://$(pwd)/test.audio
noname.xml:1: parser error : Start tag expected, '<' not found
RIFF,
^
ERROR: from element /GstPlayBin:playbin0/GstURIDecodeBin:uridecodebin0/GstDecodeBin:decodebin0/GstDashDemux:dashdemux0: Invalid manifest.
Additional debug info:
../gst-libs/gst/adaptivedemux/gstadaptivedemux.c(690): gst_adaptive_demux_sink_event (): /GstPlayBin:playbin0/GstURIDecodeBin:uridecodebin0/GstDecodeBin:decodebin0/GstDashDemux:dashdemux0
ERROR: pipeline doesn't want to preroll.
In our case, we're using appsrc + decodebin in our pipeline, we're not aware of any workaround to get this audio through as decodebin internally uses typefind.