VibeVoice-ASR AWQ INT4

This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.

Quantization

This model is stored in a split VibeVoice layout:

Keep this layout intact when downloading or mirroring the repository.

The root config.json includes:

These fields identify the split decoder path and preserve the logical source-model metadata.

This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.

On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.

In local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.

This is a quantized derivative export, not the original upstream checkpoint.
Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under patches/vllm_0_17/.

Safetensors

Model size

1B params

Tensor type

BF16

Base model

Quantized

(5)

this model