MAGNeT-Medium-30secs-MLX-8bit

speech-swift — Apple SDK
soniqo.audio — website
blog — blog

MLX port of Meta's MAGNeT — a masked parallel text-to-music model — quantized to INT8 weight-only for on-device generation on Apple Silicon. EnCodec 32 kHz audio decoder; T5-base text encoder; per-codebook iterative decoding with restricted-context self-attention.

Model


Parameters (LM)	1.5B
Quantization	INT8 weight-only, group size 64
Format	MLX safetensors
Sample rate	32 kHz mono
Output length	30 s per generation (fixed)
Decoding steps	50 total (`[20, 10, 10, 10]` across 4 RVQ codebooks)
Bundle size	2221 MB on disk
Source	facebook/magnet-medium-30secs

Performance (Apple Silicon, 30 s audio)

Metric	Value
Real-time factor (wall / audio)	1.20
Peak RSS	3045 MB
CLAP score (laion/clap-htsat-unfused, 5 prompts)	0.345

Usage

from huggingface_hub import snapshot_download
bundle = snapshot_download("aufklarer/MAGNeT-Medium-30secs-MLX-8bit")
# See https://github.com/soniqo/speech-swift for production usage.

Source

License

CC-BY-NC 4.0 — inherited from upstream MAGNeT weights. Non-commercial use only.

Downloads last month: 32

Safetensors

Model size

0.5B params

Tensor type

BF16

F32

U32

MLX

Hardware compatibility

Quantized

Model tree for aufklarer/MAGNeT-Medium-30secs-MLX-8bit

Base model

facebook/magnet-medium-30secs

Finetuned

(2)

this model

Collection including aufklarer/MAGNeT-Medium-30secs-MLX-8bit

MLX Speech Models

Collection

Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 50 items • Updated 1 day ago • 4

Paper for aufklarer/MAGNeT-Medium-30secs-MLX-8bit

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9, 2024 • 44