IMBE-ASR Base P25 Fine-tuned (48.6M params)
P25 radio-adapted variant of imbe-asr-base-512d. Produces readable transcriptions from real P25 radio traffic.
Code: trunk-reporter/imbe-asr | Base model: imbe-asr-base-512d | Best model: imbe-asr-large-1024d
Results
Evaluated on 50 real P25 labeled samples using greedy decode vs. beam search with the included 3-gram KenLM:
| Decode method | WER | CER |
|---|---|---|
| Greedy | 37.1% | 14.8% |
| Beam + KenLM (α=0.5, β=1.0) | 19.2% | 9.5% |
The KenLM reduces WER by ~18 percentage points. Beam search with the included LM is strongly recommended.
Example P25 output: BATTALION 60 ENGINE 62 MEDIC 61 RESPOND TO 1234 MAIN STREET FOR A MEDICAL EMERGENCY
Training
Fine-tuned from imbe-asr-base-512d on ~20 hours of real P25 radio captures, pseudo-labeled with Whisper large-v3 + Qwen3-ASR ensemble. Mixed with 30% base training data to prevent catastrophic forgetting.
Files
| File | Format | Size |
|---|---|---|
model.safetensors |
SafeTensors | 205 MB |
config.json |
JSON | — |
model.onnx |
ONNX fp32 | 196 MB |
model_int8.onnx |
ONNX int8 | 58 MB |
stats.npz |
NumPy | 2 KB |
lm/3gram.bin |
KenLM trie (3-gram, q8) | 501 MB |
lm/unigrams.txt |
Vocabulary | 9 MB |
Usage
Greedy decode (fast, no dependencies)
import onnxruntime as ort, numpy as np
session = ort.InferenceSession("model_int8.onnx")
stats = np.load("stats.npz")
features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32)
log_probs, out_lengths = session.run(None, {
"features": features.reshape(1, -1, 170),
"lengths": np.array([features.shape[0]], dtype=np.int64),
})
Beam search + KenLM (recommended, ~18pp WER improvement)
import onnxruntime as ort, numpy as np
from pyctcdecode import build_ctcdecoder
import kenlm
# Load model
session = ort.InferenceSession("model_int8.onnx")
stats = np.load("stats.npz")
# Build decoder with KenLM — tuned params for P25
VOCAB = list(" ABCDEFGHIJKLMNOPQRSTUVWXYZ'") # 28 chars + blank at index 0
labels = [""] + VOCAB
decoder = build_ctcdecoder(
labels=labels,
kenlm_model_path="lm/3gram.bin",
unigrams=open("lm/unigrams.txt").read().splitlines(),
alpha=0.5, # LM weight — tuned on P25 data
beta=1.0, # word insertion bonus
)
# Run inference
features = ((raw_params - stats["mean"]) / stats["std"]).astype(np.float32)
log_probs, out_lengths = session.run(None, {
"features": features.reshape(1, -1, 170),
"lengths": np.array([features.shape[0]], dtype=np.int64),
})
text = decoder.decode(log_probs[0, :out_lengths[0]], beam_width=100)
Install dependencies: pip install pyctcdecode kenlm
Limitations
- Pseudo-labeled training data may contain transcription errors.
- P25 coverage is primarily law enforcement, fire, and EMS from one region. May not generalize to all agencies.
- A P25 fine-tuned version of the large-1024d model is in progress and will substantially outperform this one.
- English only.
- Downloads last month
- 59
Model tree for trunk-reporter/imbe-asr-base-512d-p25
Base model
trunk-reporter/imbe-asr-base-512d