Vogent-Turn-80M (ONNX, re-exported)

This is a clean re-export of vogent/Vogent-Turn-80M to ONNX, produced from the upstream PyTorch weights with proper dynamic_axes annotations and the more natural input dtypes the underlying PyTorch model uses.

Weights are byte-equivalent to the upstream model — only the ONNX graph signature differs.

Why this re-export exists

The official onnx-fp16/whisper-smol-lm-smaller-fp16.onnx shipped on the upstream repo was traced with a dummy input where text_len = 1 (a single-token prompt). That trace baked the size-1 dimension into many intermediate value_info entries, and ORT's CUDA execution provider then emitted a warning on every inference call:

Shape mismatch attempting to re-use buffer. {1,401} != {1,1}.
Validate usage of dim_value (values should be > 0) and dim_param
(all values with the same string should equate to the same size)
in shapes in the model.

(The 401 is attention_mask's runtime length — AUDIO_TOKENS (400) + text_len. The 1 is the dummy-trace residue.)

The original export also used int32 for input_ids and attention_mask, despite the torch model using int64 natively. Most loaders work around that, but it's an extra cast on every call.

What changed

	Upstream `onnx-fp16/...`	This re-export
`input_ids` dtype	int32	int64 (matches torch reference)
`attention_mask` dtype	int32	int64
`audio_features` dtype	fp16 (fp16 graph) / fp32 (fp32 graph)	fp32 for both — graph casts internally for the fp16 variant
`dynamic_axes` annotations	dummy-traced (size-1 baked into intermediates)	symbolic `sequence_length`, `num_frames` everywhere variable
Per-call ORT `Shape mismatch` warning	yes (2× per inference call under CUDA EP)	none
Logits output dtype	fp16 (fp16 graph) / fp32 (fp32 graph)	unchanged
Weights	byte-identical	byte-identical
Graph topology / accuracy	reference	identical

Files

onnx-fp32/whisper-smol-lm-smaller.onnx      # full-precision graph
onnx-fp16/whisper-smol-lm-smaller-fp16.onnx # half-precision graph (audio_features still fp32-in)
config.json                                  # WhisperSmolLMClassifierConfig (unchanged)
export/export_vogent_turn_onnx.py            # re-export script
export/fetch_vogent_turn_bundle.py           # original-bundle fetch script
LICENSE                                      # upstream Vogent license (carried forward)

Quick usage (Python + onnxruntime)

import numpy as np, onnxruntime as ort

sess = ort.InferenceSession("onnx-fp16/whisper-smol-lm-smaller-fp16.onnx",
                            providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

# Whisper-tiny mel: 80 mels × 800 frames at 16 kHz hop=160 over 8 s of audio.
audio_features = np.random.randn(1, 80, 800).astype(np.float32)
input_ids      = np.array([[1]], dtype=np.int64)        # SmolLM <|im_start|> fallback
attention_mask = np.ones((1, 400 + 1), dtype=np.int64)  # 400 audio tokens + text_len

logits = sess.run(None, {
    "input_ids":     input_ids,
    "attention_mask": attention_mask,
    "audio_features": audio_features,
})[0]
# logits is fp16 [1, 2] — softmax to get (p_continue, p_endpoint).

For the chat-template / preprocessing logic, see the upstream vogent_turn repo: https://github.com/vogent/vogent-turn.

Reproducing

# 1. Fetch the original upstream bundle (gated, needs HF_TOKEN).
python export/fetch_vogent_turn_bundle.py --out models/vogent-turn-80m

# 2. Clone the official inference package (provides the model class).
git clone --depth 1 https://github.com/vogent/vogent-turn /tmp/vogent-turn-src

# 3. Re-export. Replaces onnx-fp32/* and onnx-fp16/* (.bak kept).
python export/export_vogent_turn_onnx.py \
    --model-dir models/vogent-turn-80m \
    --vogent-turn-src /tmp/vogent-turn-src

Tested with torch==2.12, transformers==4.57, onnx==1.20, onnxruntime==1.20+cuda12. The export patches transformers.masking_utils.{sdpa_mask,eager_mask} to vmap-free equivalents because the legacy torch.onnx exporter can't trace through transformers' 4-deep torch.func.vmap causal-mask builder.

License & gating

The upstream license (modified Apache-2.0) carries through. Per Vogent's gating terms, if you use this model as part of a horizontal voice-agent platform, you must not set Vogent-Turn-80M as the default turn-detector option, and users must explicitly select "Vogent Turn Detector" to enable it. See LICENSE.

Credits

Original model: Vogent AI — vogent/Vogent-Turn-80M, blog post, inference code.
Re-export tooling: produced as part of remotemedia-sdk integration work. The scripts in export/ are MIT-licensed; the model weights themselves remain under the upstream Vogent license.

Downloads last month: 16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for matbee/vogent-turn-80m-onnx

Base model

vogent/Vogent-Turn-80M

Quantized

(1)

this model