Questions about streaming with Parakeet and TDT merging methods

#13
by alexandreacff - opened

I’m currently trying to work with Parakeet in streaming mode, receiving microphone chunks and generating live transcriptions.

As a reference, I’m using the following code for streaming: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py

However, I’ve run into some questions:

  1. Why do the more conventional merging methods not work well for TDT? I tested them, but the performance dropped significantly.

  2. Is there already an implementation available for this use case (streaming with Parakeet using microphone chunks)?

Guys maybe this will help.
I finally managed to make the streaming with microphone gradio working. There are no errors regarding microphone now. I was also fighting with that problem a lot.
The space itself is not great, but the concept of streaming and gradio integration finally works.

https://huggingface.co/spaces/WJ88/Parakeet-TDT-0.6b-V3_-_multilingual_but_performance_issues_accumulating_over_time

Sign up or log in to comment