VibeVoice-ASR AWQ INT4

This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.

Quantization

  • Method: AWQ
  • Bits: 4
  • Group size: 128
  • Logical parameter count: 8,674,021,857

Repository layout

This model is stored in a split VibeVoice layout:

  • root directory: VibeVoice audio and non-decoder weights
  • decoder-awq/: quantized Qwen2 decoder weights

Keep this layout intact when downloading or mirroring the repository.

Metadata

The root config.json includes:

  • vibevoice_metadata
  • vibevoice_decoder_model_path
  • vibevoice_decoder_quantization

These fields identify the split decoder path and preserve the logical source-model metadata.

Validation

This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.

  • outputs remained valid JSON transcript arrays
  • output similarity to the full model remained high on tested samples

Serving note for vLLM 0.17.x

On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.

  • prefer letting vLLM infer the backend from config.json
  • if you must set it explicitly, use awq_marlin rather than plain awq

In local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.

Upstream references

Notes

  • This is a quantized derivative export, not the original upstream checkpoint.
  • Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
  • Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under patches/vllm_0_17/.
Downloads last month
727
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lemuriandezapada/VibeVoice-ASR-awq-int4

Quantized
(5)
this model

Paper for lemuriandezapada/VibeVoice-ASR-awq-int4