MLX Speech Models
Collection
Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 48 items • Updated • 4
How to use aufklarer/MAGNeT-Small-30secs-MLX-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir MAGNeT-Small-30secs-MLX-4bit aufklarer/MAGNeT-Small-30secs-MLX-4bit
MLX port of Meta's MAGNeT — a masked parallel text-to-music model — quantized to INT4 weight-only for on-device generation on Apple Silicon. EnCodec 32 kHz audio decoder; T5-base text encoder; per-codebook iterative decoding with restricted-context self-attention.
| Parameters (LM) | 300M |
| Quantization | INT4 weight-only, group size 64 |
| Format | MLX safetensors |
| Sample rate | 32 kHz mono |
| Output length | 30 s per generation (fixed) |
| Decoding steps | 50 total ([20, 10, 10, 10] across 4 RVQ codebooks) |
| Bundle size | 499 MB on disk |
| Source | facebook/magnet-small-30secs |
| Metric | Value |
|---|---|
| Real-time factor (wall / audio) | 0.28 |
| Peak RSS | 1351 MB |
| CLAP score (laion/clap-htsat-unfused, 5 prompts) | 0.409 |
from huggingface_hub import snapshot_download
bundle = snapshot_download("aufklarer/MAGNeT-Small-30secs-MLX-4bit")
# See https://github.com/soniqo/speech-swift for production usage.
CC-BY-NC 4.0 — inherited from upstream MAGNeT weights. Non-commercial use only.
Quantized
Base model
facebook/magnet-small-30secs