How to use from the
Use from the
MLX library
# Download the model from the Hub
pip install huggingface_hub[hf_xet]

huggingface-cli download --local-dir nemotron-speech-streaming-en-0.6b-8bit lightsofapollo/nemotron-speech-streaming-en-0.6b-8bit

animaslabs/nemotron-speech-streaming-en-0.6b-mlx-8bit

This model was converted to MLX format, 8-bit quantized from nvidia/nemotron-speech-streaming-en-0.6b using the scripts in this github repo. Please refer to original model card for more details on the model.

Usage

Quantized models require calling mlx.nn.quantize() before loading weights.

import json
import mlx.nn as nn
from huggingface_hub import hf_hub_download
from parakeet_mlx.utils import from_config

# Download and load config
config_path = hf_hub_download("animaslabs/nemotron-speech-streaming-en-0.6b-mlx-8bit", "config.json")
with open(config_path) as f:
    config = json.load(f)

# Build model and apply quantization structure
model = from_config(config)
nn.quantize(
    model,
    bits=config["quantization"]["bits"],
    group_size=config["quantization"]["group_size"],
)

# Load quantized weights
weights_path = hf_hub_download("animaslabs/nemotron-speech-streaming-en-0.6b-mlx-8bit", "model.safetensors")
model.load_weights(weights_path)

# Transcribe
result = model.transcribe("audio.wav")
print(result.text)
Downloads last month
176
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lightsofapollo/nemotron-speech-streaming-en-0.6b-8bit

Quantized
(9)
this model

Datasets used to train lightsofapollo/nemotron-speech-streaming-en-0.6b-8bit