Instructions to use nvidia/parakeet-tdt-0.6b-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/parakeet-tdt-0.6b-v3 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v3") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Questions about streaming with Parakeet and TDT merging methods
I’m currently trying to work with Parakeet in streaming mode, receiving microphone chunks and generating live transcriptions.
As a reference, I’m using the following code for streaming: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py
However, I’ve run into some questions:
Why do the more conventional merging methods not work well for TDT? I tested them, but the performance dropped significantly.
Is there already an implementation available for this use case (streaming with Parakeet using microphone chunks)?
I responded in the adjacent thread https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/discussions/63#68cc58004fdfe65cc5d61be5
In brief:
- Please, use the new streaming pipeline https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py
- You can try https://github.com/NVIDIA-NeMo/NeMo/pull/14759 as a reference for chunked streaming with microphone
Guys maybe this will help.
I finally managed to make the streaming with microphone gradio working. There are no errors regarding microphone now. I was also fighting with that problem a lot.
The space itself is not great, but the concept of streaming and gradio integration finally works.