matroskademux: subrip subtitles may be rendered with XML tags
Describe your issue
We have a file, it's muxed with ffmpeg, and contains SubRip subtitles (from an srt file): SampleVideo_640x480_1mb.mkv
VLC can play this file without issues
But if we play it with gst-play-1.0, we can see the XML tags rendered together with the subtitle text.
Expected Behavior
Only the text of the subtitles is rendered
Observed Behavior
Rendered text contains unhandled xml tags
Setup
- Operating System: Any
- Device: Any
- GStreamer Version: Any
-
Command line:
gst-play-1.0 SampleVideo_640x480_1mb.mkv
Steps to reproduce the bug
- open terminal
- type
gst-play-1.0 SampleVideo_640x480_1mb.mkv
- check if the subtitles have been rendered as expected
How reproducible is the bug?
Always
Additional Information
Subtitles are correctly muxed and have codec id of the SubRip format: S_TEXT/UTF8
When matroskademux element opens this file, for subtitles stream it exposes pango-markup caps
...
if (!strcmp (codec_id, GST_MATROSKA_CODEC_ID_SUBTITLE_UTF8)) {
/* well, plain text simply does not have a lot of markup ... */
caps = gst_caps_new_simple ("text/x-raw", "format", G_TYPE_STRING,
"pango-markup", NULL);
context->postprocess_frame = gst_matroska_demux_check_subtitle_buffer;
subtitlecontext->check_markup = TRUE;
...
This particular action (saying that SubRip is a pango markup) is wrong: SubRip has it's own markup, and it's not always compatible with pango, and the file attached is the case.
About the fix (provided in the PR)
If we open with GStreamer an srt file with the same subtitles, it doesn't have this issue, because the input is handled by a subparse element, that converts different subtitle formats to pango-markup and also throws away unknown markups.
An idea of the fix proposed in the MR is to make subparse element autoplug after matroskademux and make the convertion SubRip --> pango-markup. To do that we do 2 things:
-
Instead of "pango-markup" expose from matroskademux some new format, let's call it "text/x-subrip-muxed" (do you know, maybe there's already some existing format for it?). NOTE: We can't use "application/x-subtitle" that is used for srt files, because it's slightly different: the data of such format is supposed to have a number and a timestamp inside of the text.
-
Make subparse element handle "text/x-subrip-muxed".
PS
Apart from not rendering unhandled xml tags it also would be nice to actually handle them. This Issue is pretty convenient for this, because the steps to reproduce are the same. However, we prefer to go step-by-step, so first let's just fix the bug, and then maybe add a new support.