VIBEVOICE-ASR Technical Report
Paper • 2601.18184 • Published • 23
This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.
This model is stored in a split VibeVoice layout:
decoder-awq/: quantized Qwen2 decoder weightsKeep this layout intact when downloading or mirroring the repository.
The root config.json includes:
vibevoice_metadatavibevoice_decoder_model_pathvibevoice_decoder_quantizationThese fields identify the split decoder path and preserve the logical source-model metadata.
This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.
On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.
config.jsonawq_marlin rather than plain awqIn local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.
patches/vllm_0_17/.Base model
microsoft/VibeVoice-ASR