Vogent-Turn-80M (ONNX, re-exported)
This is a clean re-export of vogent/Vogent-Turn-80M to ONNX, produced from the upstream PyTorch weights with proper dynamic_axes annotations and the more natural input dtypes the underlying PyTorch model uses.
Weights are byte-equivalent to the upstream model โ only the ONNX graph signature differs.
Why this re-export exists
The official onnx-fp16/whisper-smol-lm-smaller-fp16.onnx shipped on the upstream repo was traced with a dummy input where text_len = 1 (a single-token prompt). That trace baked the size-1 dimension into many intermediate value_info entries, and ORT's CUDA execution provider then emitted a warning on every inference call:
Shape mismatch attempting to re-use buffer. {1,401} != {1,1}.
Validate usage of dim_value (values should be > 0) and dim_param
(all values with the same string should equate to the same size)
in shapes in the model.
(The 401 is attention_mask's runtime length โ AUDIO_TOKENS (400) + text_len. The 1 is the dummy-trace residue.)
The original export also used int32 for input_ids and attention_mask, despite the torch model using int64 natively. Most loaders work around that, but it's an extra cast on every call.
What changed
Upstream onnx-fp16/... |
This re-export | |
|---|---|---|
input_ids dtype |
int32 | int64 (matches torch reference) |
attention_mask dtype |
int32 | int64 |
audio_features dtype |
fp16 (fp16 graph) / fp32 (fp32 graph) | fp32 for both โ graph casts internally for the fp16 variant |
dynamic_axes annotations |
dummy-traced (size-1 baked into intermediates) | symbolic sequence_length, num_frames everywhere variable |
Per-call ORT Shape mismatch warning |
yes (2ร per inference call under CUDA EP) | none |
| Logits output dtype | fp16 (fp16 graph) / fp32 (fp32 graph) | unchanged |
| Weights | byte-identical | byte-identical |
| Graph topology / accuracy | reference | identical |
Files
onnx-fp32/whisper-smol-lm-smaller.onnx # full-precision graph
onnx-fp16/whisper-smol-lm-smaller-fp16.onnx # half-precision graph (audio_features still fp32-in)
config.json # WhisperSmolLMClassifierConfig (unchanged)
export/export_vogent_turn_onnx.py # re-export script
export/fetch_vogent_turn_bundle.py # original-bundle fetch script
LICENSE # upstream Vogent license (carried forward)
Quick usage (Python + onnxruntime)
import numpy as np, onnxruntime as ort
sess = ort.InferenceSession("onnx-fp16/whisper-smol-lm-smaller-fp16.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
# Whisper-tiny mel: 80 mels ร 800 frames at 16 kHz hop=160 over 8 s of audio.
audio_features = np.random.randn(1, 80, 800).astype(np.float32)
input_ids = np.array([[1]], dtype=np.int64) # SmolLM <|im_start|> fallback
attention_mask = np.ones((1, 400 + 1), dtype=np.int64) # 400 audio tokens + text_len
logits = sess.run(None, {
"input_ids": input_ids,
"attention_mask": attention_mask,
"audio_features": audio_features,
})[0]
# logits is fp16 [1, 2] โ softmax to get (p_continue, p_endpoint).
For the chat-template / preprocessing logic, see the upstream vogent_turn repo: https://github.com/vogent/vogent-turn.
Reproducing
# 1. Fetch the original upstream bundle (gated, needs HF_TOKEN).
python export/fetch_vogent_turn_bundle.py --out models/vogent-turn-80m
# 2. Clone the official inference package (provides the model class).
git clone --depth 1 https://github.com/vogent/vogent-turn /tmp/vogent-turn-src
# 3. Re-export. Replaces onnx-fp32/* and onnx-fp16/* (.bak kept).
python export/export_vogent_turn_onnx.py \
--model-dir models/vogent-turn-80m \
--vogent-turn-src /tmp/vogent-turn-src
Tested with torch==2.12, transformers==4.57, onnx==1.20, onnxruntime==1.20+cuda12. The export patches transformers.masking_utils.{sdpa_mask,eager_mask} to vmap-free equivalents because the legacy torch.onnx exporter can't trace through transformers' 4-deep torch.func.vmap causal-mask builder.
License & gating
The upstream license (modified Apache-2.0) carries through. Per Vogent's gating terms, if you use this model as part of a horizontal voice-agent platform, you must not set Vogent-Turn-80M as the default turn-detector option, and users must explicitly select "Vogent Turn Detector" to enable it. See LICENSE.
Credits
- Original model: Vogent AI โ
vogent/Vogent-Turn-80M, blog post, inference code. - Re-export tooling: produced as part of remotemedia-sdk integration work. The scripts in
export/are MIT-licensed; the model weights themselves remain under the upstream Vogent license.
- Downloads last month
- 16
Model tree for matbee/vogent-turn-80m-onnx
Base model
vogent/Vogent-Turn-80M