Parakeet TDT 0.6B v3 — Basque (Euskara) · sherpa-onnx INT8

ONNX export of itzune/parakeet-tdt-0.6b-v3-basque packaged for sherpa-onnx — a cross-platform, real-time speech recognition engine for edge devices, mobile, embedded systems, and WebAssembly.

The weights are INT8 dynamically quantised from the BF16 fine-tuned checkpoint, reducing the total package size from ~2.4 GB to ~641 MB while preserving transcription quality.

Model details

Property	Value
Architecture	FastConformer RNNT-TDT (Parakeet TDT 0.6B v3)
Language	Basque (`eu`)
Sample rate	16 kHz mono
Parameters	~600 M
Vocabulary size	1024 tokens (SentencePiece BPE)
Quantisation	INT8 dynamic (QUInt8 encoder / QInt8 decoder+joiner)
sherpa-onnx model type	`nemo_transducer`
Subsampling factor	8
Feature dimension	128 log-mel filterbanks
Base model	nvidia/parakeet-tdt-0.6b-v3
Fine-tuned model	itzune/parakeet-tdt-0.6b-v3-basque
Fine-tuning framework	NVIDIA NeMo
Hardware	NVIDIA L40 (48 GB)

Files

File	Size	Description
`encoder.int8.onnx`	623 MB	INT8 encoder (FastConformer)
`decoder.int8.onnx`	12 MB	INT8 prediction network
`joiner.int8.onnx`	6 MB	INT8 joint network
`tokens.txt`	92 KB	Vocabulary (one token per line, index = line number)

Evaluation

WER measured on held-out test splits from asierhv/composite_corpus_eu_v2.1:

Split	Baseline (base model on Basque)	Fine-tuned
`test_cv` (Common Voice)	108.47%	6.92%
`test_parl` (Parliament)	107.61%	4.36%
`test_oslr` (OpenSLR)	108.52%	14.52%

The base model is English-oriented. WER > 100% on Basque is expected for it; the fine-tuned numbers represent the actual usable quality.

Quick start

Install sherpa-onnx

pip install sherpa-onnx

Or download a pre-built binary / use the C++ API — see the sherpa-onnx documentation.

Python API

import sherpa_onnx

# Point to the folder containing the 3 ONNX files + tokens.txt
model_dir = "/path/to/parakeet-tdt-0.6b-v3-basque-sherpa-onnx"

recognizer = sherpa_onnx.OfflineRecognizer.from_transducer(
    encoder=f"{model_dir}/encoder.int8.onnx",
    decoder=f"{model_dir}/decoder.int8.onnx",
    joiner=f"{model_dir}/joiner.int8.onnx",
    tokens=f"{model_dir}/tokens.txt",
    num_threads=4,
    decoding_method="greedy_search",
    model_type="nemo_transducer",
)

# Transcribe a WAV file (16 kHz mono)
stream = recognizer.create_stream()
audio, sample_rate = sherpa_onnx.read_wave("/path/to/audio.wav")
stream.accept_waveform(sample_rate, audio)
recognizer.decode_stream(stream)
print(stream.result.text)

Command-line (offline)

sherpa-onnx \
  --encoder-model=encoder.int8.onnx \
  --decoder-model=decoder.int8.onnx \
  --joiner-model=joiner.int8.onnx \
  --tokens=tokens.txt \
  --decoding-method=greedy_search \
  --model-type=nemo_transducer \
  /path/to/audio.wav

Real-time microphone input (Python)

import sherpa_onnx
import sounddevice as sd
import numpy as np

model_dir = "/path/to/parakeet-tdt-0.6b-v3-basque-sherpa-onnx"

# For streaming/online, use OnlineRecognizer with the same model files
# (sherpa-onnx supports NeMo TDT models both offline and online)
recognizer = sherpa_onnx.OnlineRecognizer.from_transducer(
    encoder=f"{model_dir}/encoder.int8.onnx",
    decoder=f"{model_dir}/decoder.int8.onnx",
    joiner=f"{model_dir}/joiner.int8.onnx",
    tokens=f"{model_dir}/tokens.txt",
    num_threads=4,
    decoding_method="greedy_search",
    model_type="nemo_transducer",
    chunk_size=32,
)

stream = recognizer.create_stream()
sample_rate = 16000

def callback(indata, frames, time, status):
    samples = indata[:, 0].astype(np.float32)
    stream.accept_waveform(sample_rate, samples)
    while recognizer.is_ready(stream):
        recognizer.decode_stream(stream)
    result = recognizer.get_result(stream)
    if result:
        print(f"\r{result}", end="", flush=True)

with sd.InputStream(samplerate=sample_rate, channels=1, callback=callback):
    print("Listening... Press Ctrl+C to stop.")
    import time
    while True:
        time.sleep(0.1)

Mobile / embedded / WebAssembly

sherpa-onnx supports many deployment targets beyond Python:

Platform	Notes
Android	Java/Kotlin API, pre-built AAR
iOS	Swift/ObjC API
Raspberry Pi / ARM	Static C++ binaries available
WebAssembly	In-browser speech recognition
Windows / macOS / Linux	Native binaries and shared libraries

See sherpa-onnx releases for pre-built packages.

Export recipe

This model was exported from the .nemo checkpoint using a custom script based on the official sherpa-onnx export guide for Parakeet TDT:

Load fine-tuned NeMo model
Export encoder, decoder, joiner as separate ONNX graphs
Add required metadata to each graph (vocab_size, subsampling_factor, feat_dim, etc.)
Apply INT8 dynamic quantisation via onnxruntime.quantization.quantize_dynamic

The export and fine-tuning code is available at: xezpeleta/parakeet-tdt-0.6b-v3-basque.

Related models

Repo	Format	Use case
itzune/parakeet-tdt-0.6b-v3-basque	NeMo `.nemo`	Full NeMo / PyTorch inference & fine-tuning
xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr	ONNX-ASR	Simple Python inference via `onnx-asr`
This repo	sherpa-onnx INT8	On-device / real-time / cross-platform

Citation and acknowledgements

If you use this model, please credit:

Base model: nvidia/parakeet-tdt-0.6b-v3
Fine-tuned model: itzune/parakeet-tdt-0.6b-v3-basque
Training dataset: asierhv/composite_corpus_eu_v2.1
sherpa-onnx: k2-fsa/sherpa-onnx

Underlying source collections in the training corpus:

Mozilla Common Voice (Basque)
Basque Parliament corpus
OpenSLR Basque resources

License

CC BY 4.0. Inherit license obligations from the base model and dataset.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for xezpeleta/parakeet-tdt-0.6b-v3-basque-sherpa-onnx

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

itzune/parakeet-tdt-0.6b-v3-basque

Quantized

(2)

this model

Evaluation results

test_cv WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

6.920
test_parl WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

4.360
test_oslr WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

14.520