YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
---
license: mit
language:
- zh
- en
tags:
- text-classification
- sentence-transformers
- sklearn
- automotive
- ev
- chinese
- english
- multilingual
pipeline_tag: text-classification
datasets:
- yingjie16/ev-lead-triage
---
# EV Lead Triage Classifier
EV Lead Triage Classifier 是一个用于新能源汽车销售线索意图分类的中英双语文本分类模型包。
## 模型描述
该方案先使用 `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` 将文本编码为 384 维 embedding,再在 embedding 之上训练 `MLPClassifier(hidden_layer_sizes=(256, 128), max_iter=500, early_stopping=True)`,用于 6 类销售线索意图分类。
- 最佳模型名称:`mlp`
- Embedding 模型:`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`
- 分类器:`MLPClassifier(hidden_layer_sizes=(256, 128), max_iter=500, early_stopping=True)`
- 任务类型:`text-classification`
- 语言:中文 / 英文
## 训练数据来源
数据集来自 [yingjie16/ev-lead-triage](https://huggingface.co/datasets/yingjie16/ev-lead-triage),为中英双语新能源汽车销售线索合成数据集,覆盖 6 类意图标签。
## 标签空间
- `hot_lead_test_drive`
- `comparison_shopping`
- `pricing_inquiry`
- `after_sales`
- `delivery_followup`
- `general_curious`
## 评估结果
下表直接读取自 `results/metrics_comparison.csv`,并标记了最佳模型:
| Model | Accuracy | Macro Precision | Macro Recall | Macro F1 | Best |
| --- | ---: | ---: | ---: | ---: | --- | | mlp | 0.8833 | 0.8848 | 0.8833 | 0.8837 | yes | | svc_rbf | 0.8667 | 0.8787 | 0.8667 | 0.8685 | | | logistic_regression | 0.8583 | 0.8606 | 0.8583 | 0.8586 | | | random_forest | 0.8333 | 0.8351 | 0.8333 | 0.8336 | |
最佳模型为 **`mlp`**。
## 仓库内容
- `models/best_model.joblib`
- `models/scaler.joblib`
- `models/best_model_name.txt`
- `results/metrics_comparison.csv`
- `results/classification_report_*.txt`
- `results/confusion_matrix_*.png`
- `results/metrics_comparison.png`
## Python 使用示例
```python
from pathlib import Path
import joblib
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer
REPO_ID = "yingjie16/ev-lead-classifier"
EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
model_path = hf_hub_download(repo_id=REPO_ID, filename="models/best_model.joblib")
scaler_path = hf_hub_download(repo_id=REPO_ID, filename="models/scaler.joblib")
best_name_path = hf_hub_download(repo_id=REPO_ID, filename="models/best_model_name.txt")
classifier = joblib.load(model_path)
scaler = joblib.load(scaler_path)
best_model_name = Path(best_name_path).read_text(encoding="utf-8").strip()
encoder = SentenceTransformer(EMBEDDING_MODEL)
text = "周末想试驾Model Y,上海这边还能约吗?"
embedding = encoder.encode([text], convert_to_numpy=True)
embedding_scaled = scaler.transform(embedding)
label = classifier.predict(embedding_scaled)[0]
proba = classifier.predict_proba(embedding_scaled)[0]
class_names = list(classifier.classes_)
probability = float(proba[class_names.index(label)])
print(
{
"model": best_model_name,
"label": label,
"probability": round(probability, 4),
}
)
```
## License
本模型仓库采用 **MIT** 许可协议。
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support