Automatic Speech Recognition
NeMo
PyTorch
speech-recognition
cache-aware ASR
streaming-asr
speech
audio
FastConformer
RNNT
Parakeet
ASR
NeMo
Eval Results (legacy)
Instructions to use nvidia/nemotron-speech-streaming-en-0.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/nemotron-speech-streaming-en-0.6b with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/nemotron-speech-streaming-en-0.6b") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Are there any more metrics or articles on how this compares to other models for streaming?
#13
by minimanatee - opened
The linked articles of real-world use cases make this seem very promising. The 24ms on RTX 5090 is also very impressive, but that seems to only be end-of-turn transcript finalization. Are there any other data points on how this fares compared to the other parakeet model in terms of time to partial, specifically for streaming?
i am wondering the same thing. the arxiv paper does not report true time to partial, only the minimum possible average latency that a partial could have. in my experiments the average time to partial is at least 200ms even in the most aggressive configuration.