Skip to content

Draft: coqui: New plugin for speech-to-text transcription

Philippe Normand requested to merge philn/gstreamer:coqui-stt into main

The coquistt element is an audiofilter processing incoming audio samples, feeding them to the Coqui-AI STT engine, which performs an inference using pre-trained models.

This element should be combined with a VAD filter upstream, such as webrtcdsp, so that it will process only the samples representing a human voice.

The resulting transcription is posted on the bus as an element message.

Based on initial work by Mike Sheldon elleo@gnu.org.

Merge request reports