aws: transcriber: add support for language identification
This commit adds support for language identification to the transcriber element and makes use of the identified language in the translation pad.
Language identification is activated with either of the following properties (which match the service API):
- 'identify-language' when a single language is expected in the stream.
- 'identify-multiple-languages' otherwise.
In both cases, the property 'language-options' must list the possible languages. Ex.: "en-US,es-US,fr-FR".
The following pipeline identifies languages from a stream prossibly containing multiple languages, outputs the transcription to the 'src' pad and translates when needed to French ('translate_src_0') & English ('translate_src_1'):
gst-launch-1.0 -e uridecodebin uri=file:///__PATH_TO_FILE__ ! audioconvert
! awstranscriber name=t \
access-key="__TO_BE_DEFINED__" secret-access-key="__TO_BE_DEFINED__" \
identify-multiple-languages=true \
language-options="en-US,es-US,fr-FR" \
translate_src_0::language-code=fr \
translate_src_1::language-code=en \
t. ! fakesink dump=true \
t.translate_src_0 ! fakesink dump=true \
t.translate_src_1 ! fakesink dump=true