transcriberbin: add support for consuming secondary audio streams

In some situations, a translated alternate audio stream for a content might be available.

Instead of going through transcription and translation of the original audio stream, it may be preferrable for accuracy purposes to simply transcribe the secondary audio stream.

This MR adds support for doing just that:

  • Secondary audio sink pads can be requested as "sink_audio_%u"

  • Sometimes audio source pads are added at that point to pass through the audio, as "src_audio_%u"

  • The main transcription bin now contains per-input stream transcription bins. Those can be individually controlled through properties on the sink pads, for instance translation-languages can be dynamically set per audio stream

  • Some properties that originally existed on the main element still remain, but are now simply mapped to the always audio sink pad

  • Releasing of secondary sink pads is nominally implemented, but not tested in states other than NULL

An example launch line for this would be:

$ gst-launch-1.0 transcriberbin name=transcriberbin latency=8000 accumulate-time=0 \
      cc-caps="closedcaption/x-cea-708, format=cc_data" sink_audio_0::language-code="es-US" \
      sink_audio_0::translation-languages="languages, transcript=cc3"
    uridecodebin uri=file:///home/meh/Music/chaplin.mkv name=d
      d. ! videoconvert ! transcriberbin.sink_video
      d. ! clocksync ! audioconvert ! transcriberbin.sink_audio
      transcriberbin.src_video ! cea608overlay field=1 ! videoconvert ! autovideosink \
      transcriberbin.src_audio ! audioconvert ! fakesink \
    uridecodebin uri=file:///home/meh/Music/chaplin-spanish.webm name=d2 \
      d2. ! audioconvert ! transcriberbin.sink_audio_0 \
      transcriberbin.src_audio_0 ! fakesink

