!pip install transformers numpy onnx onnxruntime -q

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import requests

onnx_model_url = "https://huggingface.co/alanjoshua2005/spam-sms-india-onnx/resolve/main/bert_sms_detector.onnx"
onnx_model_path = "bert_sms_detector.onnx"
with open(onnx_model_path, "wb") as f:
    f.write(requests.get(onnx_model_url).content)

# Load tokenizer from the correct repository
tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/Bert-sms-spam-detector-onnx")

session = ort.InferenceSession(onnx_model_path, providers=["CPUExecutionProvider"])

text = "Congratulations! You won a free prize."

inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=64)
onnx_inputs = {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64)
}

outputs = session.run(None, onnx_inputs)
logits = outputs[0]
predicted_class = int(np.argmax(logits, axis=1)[0])
class_map = {0: "Ham (Not Spam)", 1: "Spam"}
print(f"Predicted class: {class_map[predicted_class]}")

Model Evaluation Report

Model: bert-base-uncased (fine-tuned for binary text classification)
Evaluation Dataset Size: 50 samples of indian spam sms Device Used: CPUExecutionProvider

Performance Summary

Accuracy: 90.0%
Precision (macro avg): 90.58%
Recall (macro avg): 90.0%
F1-score (macro avg): 89.96%
ROC-AUC: 0.9920
PR-AUC: 0.9916

The model demonstrates strong performance with near-perfect ROC-AUC and PR-AUC, indicating excellent class separation.

Class-wise Metrics

Class	Precision	Recall	F1-score	Support
0	0.8571	0.9600	0.9057	25
1	0.9545	0.8400	0.8936	25

Class 0: Higher recall, slightly lower precision → very few misses.
Class 1: Higher precision, slightly lower recall → fewer false alarms.

Threshold Analysis

Default threshold (0.5):
- Precision: 0.9545
- Recall: 0.8400
- F1: 0.8936
Best threshold (0.01):
- Precision: 0.9600
- Recall: 0.9600
- F1: 0.9600

Adjusting the threshold allows balancing between recall and precision depending on application needs.

Latency & Efficiency

Avg latency per batch: 0.974s
Avg latency per sample: 0.039s (~25 samples/sec on CPU)

Efficient for real-time inference even without GPU acceleration.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support