MAGNeT-Small-30secs-MLX-4bit

speech-swift — Apple SDK
soniqo.audio — website
blog — blog

MLX port of Meta's MAGNeT — a masked parallel text-to-music model — quantized to INT4 weight-only for on-device generation on Apple Silicon. EnCodec 32 kHz audio decoder; T5-base text encoder; per-codebook iterative decoding with restricted-context self-attention.

Model


Parameters (LM)	300M
Quantization	INT4 weight-only, group size 64
Format	MLX safetensors
Sample rate	32 kHz mono
Output length	30 s per generation (fixed)
Decoding steps	50 total (`[20, 10, 10, 10]` across 4 RVQ codebooks)
Bundle size	499 MB on disk
Source	facebook/magnet-small-30secs

Performance (Apple Silicon, 30 s audio)

Metric	Value
Real-time factor (wall / audio)	0.28
Peak RSS	1351 MB
CLAP score (laion/clap-htsat-unfused, 5 prompts)	0.409

Usage

from huggingface_hub import snapshot_download
bundle = snapshot_download("aufklarer/MAGNeT-Small-30secs-MLX-4bit")
# See https://github.com/soniqo/speech-swift for production usage.

Source

License

CC-BY-NC 4.0 — inherited from upstream MAGNeT weights. Non-commercial use only.

Downloads last month: 108

Safetensors

Model size

80.6M params

Tensor type

BF16

F32

U32

MLX

Hardware compatibility

Quantized

Model tree for aufklarer/MAGNeT-Small-30secs-MLX-4bit

Base model

facebook/magnet-small-30secs

Finetuned

(2)

this model

Collection including aufklarer/MAGNeT-Small-30secs-MLX-4bit

MLX Speech Models

Collection

Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 48 items • Updated about 12 hours ago • 4

Paper for aufklarer/MAGNeT-Small-30secs-MLX-4bit

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9, 2024 • 44