!pip install transformers numpy onnx onnxruntime -q

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import requests

onnx_model_url = "https://huggingface.co/alanjoshua2005/spam-sms-india-onnx/resolve/main/bert_sms_detector.onnx"
onnx_model_path = "bert_sms_detector.onnx"
with open(onnx_model_path, "wb") as f:
    f.write(requests.get(onnx_model_url).content)

# Load tokenizer from the correct repository
tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/Bert-sms-spam-detector-onnx")

session = ort.InferenceSession(onnx_model_path, providers=["CPUExecutionProvider"])

text = "Congratulations! You won a free prize."

inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=64)
onnx_inputs = {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64)
}

outputs = session.run(None, onnx_inputs)
logits = outputs[0]
predicted_class = int(np.argmax(logits, axis=1)[0])
class_map = {0: "Ham (Not Spam)", 1: "Spam"}
print(f"Predicted class: {class_map[predicted_class]}")

Model Evaluation Report

Model: bert-base-uncased (fine-tuned for binary text classification)
Evaluation Dataset Size: 50 samples of indian spam sms Device Used: CPUExecutionProvider


Performance Summary

  • Accuracy: 90.0%
  • Precision (macro avg): 90.58%
  • Recall (macro avg): 90.0%
  • F1-score (macro avg): 89.96%
  • ROC-AUC: 0.9920
  • PR-AUC: 0.9916

The model demonstrates strong performance with near-perfect ROC-AUC and PR-AUC, indicating excellent class separation.


Class-wise Metrics

Class Precision Recall F1-score Support
0 0.8571 0.9600 0.9057 25
1 0.9545 0.8400 0.8936 25
  • Class 0: Higher recall, slightly lower precision โ†’ very few misses.
  • Class 1: Higher precision, slightly lower recall โ†’ fewer false alarms.

Threshold Analysis

  • Default threshold (0.5):

    • Precision: 0.9545
    • Recall: 0.8400
    • F1: 0.8936
  • Best threshold (0.01):

    • Precision: 0.9600
    • Recall: 0.9600
    • F1: 0.9600

Adjusting the threshold allows balancing between recall and precision depending on application needs.


Latency & Efficiency

  • Avg latency per batch: 0.974s
  • Avg latency per sample: 0.039s (~25 samples/sec on CPU)

Efficient for real-time inference even without GPU acceleration.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support