!pip install transformers numpy onnx onnxruntime -q
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import requests
onnx_model_url = "https://huggingface.co/alanjoshua2005/spam-sms-india-onnx/resolve/main/bert_sms_detector.onnx"
onnx_model_path = "bert_sms_detector.onnx"
with open(onnx_model_path, "wb") as f:
f.write(requests.get(onnx_model_url).content)
# Load tokenizer from the correct repository
tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/Bert-sms-spam-detector-onnx")
session = ort.InferenceSession(onnx_model_path, providers=["CPUExecutionProvider"])
text = "Congratulations! You won a free prize."
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=64)
onnx_inputs = {
"input_ids": inputs["input_ids"].astype(np.int64),
"attention_mask": inputs["attention_mask"].astype(np.int64)
}
outputs = session.run(None, onnx_inputs)
logits = outputs[0]
predicted_class = int(np.argmax(logits, axis=1)[0])
class_map = {0: "Ham (Not Spam)", 1: "Spam"}
print(f"Predicted class: {class_map[predicted_class]}")
Model Evaluation Report
Model: bert-base-uncased (fine-tuned for binary text classification)
Evaluation Dataset Size: 50 samples of indian spam sms
Device Used: CPUExecutionProvider
Performance Summary
- Accuracy: 90.0%
- Precision (macro avg): 90.58%
- Recall (macro avg): 90.0%
- F1-score (macro avg): 89.96%
- ROC-AUC: 0.9920
- PR-AUC: 0.9916
The model demonstrates strong performance with near-perfect ROC-AUC and PR-AUC, indicating excellent class separation.
Class-wise Metrics
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 | 0.8571 | 0.9600 | 0.9057 | 25 |
| 1 | 0.9545 | 0.8400 | 0.8936 | 25 |
- Class 0: Higher recall, slightly lower precision โ very few misses.
- Class 1: Higher precision, slightly lower recall โ fewer false alarms.
Threshold Analysis
Default threshold (0.5):
- Precision: 0.9545
- Recall: 0.8400
- F1: 0.8936
Best threshold (0.01):
- Precision: 0.9600
- Recall: 0.9600
- F1: 0.9600
Adjusting the threshold allows balancing between recall and precision depending on application needs.
Latency & Efficiency
- Avg latency per batch: 0.974s
- Avg latency per sample: 0.039s (~25 samples/sec on CPU)
Efficient for real-time inference even without GPU acceleration.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support