IMBE-ASR Base P25 Fine-tuned (48.6M params)

P25 radio-adapted variant of imbe-asr-base-512d. Produces readable transcriptions from real P25 radio traffic.

Code: trunk-reporter/imbe-asr | Base model: imbe-asr-base-512d | Best model: imbe-asr-large-1024d

Results

Evaluated on 50 real P25 labeled samples using greedy decode vs. beam search with the included 3-gram KenLM:

Decode method	WER	CER
Greedy	37.1%	14.8%
Beam + KenLM (α=0.5, β=1.0)	19.2%	9.5%

The KenLM reduces WER by ~18 percentage points. Beam search with the included LM is strongly recommended.

Example P25 output: BATTALION 60 ENGINE 62 MEDIC 61 RESPOND TO 1234 MAIN STREET FOR A MEDICAL EMERGENCY

Training

Fine-tuned from imbe-asr-base-512d on ~20 hours of real P25 radio captures, pseudo-labeled with Whisper large-v3 + Qwen3-ASR ensemble. Mixed with 30% base training data to prevent catastrophic forgetting.

Files

File	Format	Size
`model.safetensors`	SafeTensors	205 MB
`config.json`	JSON	—
`model.onnx`	ONNX fp32	196 MB
`model_int8.onnx`	ONNX int8	58 MB
`stats.npz`	NumPy	2 KB
`lm/3gram.bin`	KenLM trie (3-gram, q8)	501 MB
`lm/unigrams.txt`	Vocabulary	9 MB

Usage

Greedy decode (fast, no dependencies)

import onnxruntime as ort, numpy as np

session = ort.InferenceSession("model_int8.onnx")
stats = np.load("stats.npz")
features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32)
log_probs, out_lengths = session.run(None, {
    "features": features.reshape(1, -1, 170),
    "lengths": np.array([features.shape[0]], dtype=np.int64),
})

Beam search + KenLM (recommended, ~18pp WER improvement)

import onnxruntime as ort, numpy as np
from pyctcdecode import build_ctcdecoder
import kenlm

# Load model
session = ort.InferenceSession("model_int8.onnx")
stats = np.load("stats.npz")

# Build decoder with KenLM — tuned params for P25
VOCAB = list(" ABCDEFGHIJKLMNOPQRSTUVWXYZ'")  # 28 chars + blank at index 0
labels = [""] + VOCAB
decoder = build_ctcdecoder(
    labels=labels,
    kenlm_model_path="lm/3gram.bin",
    unigrams=open("lm/unigrams.txt").read().splitlines(),
    alpha=0.5,   # LM weight — tuned on P25 data
    beta=1.0,    # word insertion bonus
)

# Run inference
features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32)
log_probs, out_lengths = session.run(None, {
    "features": features.reshape(1, -1, 170),
    "lengths": np.array([features.shape[0]], dtype=np.int64),
})
text = decoder.decode(log_probs[0, :out_lengths[0]], beam_width=100)

Install dependencies: pip install pyctcdecode kenlm

Limitations

Pseudo-labeled training data may contain transcription errors.
P25 coverage is primarily law enforcement, fire, and EMS from one region. May not generalize to all agencies.
A P25 fine-tuned version of the large-1024d model is in progress and will substantially outperform this one.
English only.

Downloads last month: 59

Model tree for trunk-reporter/imbe-asr-base-512d-p25

Base model

trunk-reporter/imbe-asr-base-512d

Quantized

(1)

this model

Datasets used to train trunk-reporter/imbe-asr-base-512d-p25

Collection including trunk-reporter/imbe-asr-base-512d-p25

IMBE-ASR: Speech Recognition from Vocoder Parameters

Collection

ASR directly from P25 IMBE codec parameters. Skip audio reconstruction, go straight from the digital bitstream to text. 1.9%% WER on LibriSpeech-IMBE. • 3 items • Updated 15 days ago