GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Paper โข 2506.01192 โข Published
MLX port of GigaAM-v3 RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.
pip install git+https://github.com/aystream/gigaam-mlx.git
from gigaam_mlx import load_model, transcribe
model, tokenizer = load_model("rnnt") # downloads automatically
text = transcribe(model, tokenizer, "recording.wav")
Or via CLI:
gigaam-mlx recording.wav --model-type rnnt
| Variant | Speed (20s chunk) | Quality | Full 18-min video |
|---|---|---|---|
| CTC | 0.06s (~330x) | Good | 21.5s |
| RNNT (this) | 0.26s (~77x) | Better | 25.0s |
Quantized
Base model
ai-sage/GigaAM-v3