Upload MultiEvalVietSum: weights, tokenizer, config, code, and model card

Browse files

Files changed (9) hide show

.gitattributes +1 -34
README.md +129 -0
inference_example.py +85 -0
modeling_multievalvietsum.py +65 -0
multievalvietsum_config.json +17 -0
pytorch_model.bin +3 -0
tokenizer.json +3 -0
tokenizer_config.json +24 -0
training_summary.json +50 -0

.gitattributes CHANGED Viewed

@@ -1,35 +1,2 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text




1	*.bin filter=lfs diff=lfs merge=lfs -text
2	+ tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+language:
+- vi
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- vietnamese
+- summarization
+- evaluation
+- cross-encoder
+- research
+---
+# MultiEvalVietSum
+MultiEvalVietSum is a Vietnamese summary evaluation model released under the Hugging Face account phuongntc.
+It is a criterion-specific cross-encoder evaluator that takes a source document and a candidate summary as input and outputs three scalar scores:
+- Faithfulness
+- Coherence
+- Relevance
+## Model description
+This model is built on top of the multilingual long-context encoder jhu-clsp/mmBERT-base and fine-tuned as a custom evaluator for Vietnamese summarization research.
+Architecture summary:
+- Backbone: jhu-clsp/mmBERT-base
+- Input format: (document, summary) pair
+- Pooling: CLS + mean pooling
+- Prediction heads: three scalar regression heads
+- Criteria: faithfulness, coherence, relevance
+- Training objective: MSE regression + pairwise margin ranking loss
+## Intended use
+This model is intended for:
+- research on automatic summary evaluation in Vietnamese
+- system comparison for Vietnamese summarization
+- criterion-specific scoring of candidate summaries against a source document
+This model is not intended to replace human judgment in high-stakes evaluation settings.
+## Input processing
+The evaluator uses a pairwise input construction strategy:
+- the summary is truncated first up to SUM_MAX_LEN = 192
+- the remaining token budget is assigned to the source document
+- the total pair length is capped at MAX_LEN = 2048
+This design prioritizes source-document evidence during evaluation.
+## Reported setup
+- model_name: MultiEvalVietSum
+- repo_id: phuongntc/MultiEvalVietSum
+- backbone: jhu-clsp/mmBERT-base
+- task: Vietnamese summary evaluation
+- max_len: 2048
+- summary_max_len: 192
+- pooling: CLS + mean pooling
+- outputs: faithfulness, coherence, relevance
+Validation metrics:
+- val_pearson_faith: None
+- val_pearson_coh: None
+- val_pearson_rel: None
+- val_pearson_mean: None
+- val_spearman_faith: None
+- val_spearman_coh: None
+- val_spearman_rel: None
+- val_spearman_mean: None
+## Output format
+The model outputs three scalar scores:
+1. faithfulness
+2. coherence
+3. relevance
+Users may optionally combine them into an overall score using a weighting scheme appropriate for their study.
+## Limitations
+- The model only sees the truncated (document, summary) pair defined by the preprocessing pipeline
+- Very long documents may be partially invisible to the evaluator
+- If a candidate summary is longer than the summary cap, only the visible portion is evaluated
+- Performance may vary across domains outside the training or evaluation distribution
+## Transparency and reproducibility notes
+To reproduce scores as closely as possible, users should keep the following consistent:
+- backbone model
+- tokenizer
+- MAX_LEN
+- SUM_MAX_LEN
+- pair construction rule
+- model architecture and checkpoint
+The repository includes:
+- tokenizer files
+- evaluator weights
+- a custom loader file
+- an inference example
+- a training summary file
+## How to use
+After downloading the repo, use the included files:
+- modeling_multievalvietsum.py
+- inference_example.py
+Example:
+1. Download or clone the repository
+2. Open Python in that folder
+3. Run:
+   from inference_example import predict_scores
+   scores = predict_scores("Văn bản gốc", "Bản tóm tắt", model_dir=".")
+   print(scores)
+## Citation
+@misc{phuong2026multievalvietsum,
+  title={MultiEvalVietSum: A Vietnamese Criterion-Specific Evaluator for Summary Assessment},
+  author={Phuong N. T. and collaborators},
+  year={2026},
+  note={Model card and code release on Hugging Face},
+  howpublished={\url{https://huggingface.co/phuongntc/MultiEvalVietSum}}
+}

inference_example.py ADDED Viewed

	@@ -0,0 +1,85 @@

+import torch
+from modeling_multievalvietsum import MultiEvalVietSumModel
+def build_pair_feature(tokenizer, document, summary, max_len=2048, summary_max_len=192):
+    sum_ids = tokenizer(
+        summary,
+        truncation=True,
+        max_length=summary_max_len,
+        add_special_tokens=False,
+        return_attention_mask=False,
+    )["input_ids"]
+    doc_ids = tokenizer(
+        document,
+        truncation=False,
+        add_special_tokens=False,
+        return_attention_mask=False,
+    )["input_ids"]
+    special_pair_tokens = tokenizer.num_special_tokens_to_add(pair=True)
+    doc_budget = max(16, max_len - len(sum_ids) - special_pair_tokens)
+    doc_ids = doc_ids[:doc_budget]
+    model_inputs = getattr(tokenizer, "model_input_names", [])
+    return_token_type_ids = "token_type_ids" in model_inputs
+    try:
+        feat = tokenizer.prepare_for_model(
+            doc_ids,
+            pair_ids=sum_ids,
+            add_special_tokens=True,
+            padding=False,
+            truncation=False,
+            return_attention_mask=True,
+            return_token_type_ids=return_token_type_ids,
+        )
+        feat = {k: v for k, v in feat.items() if k in {"input_ids", "attention_mask", "token_type_ids"}}
+        return feat
+    except Exception:
+        cls_id = tokenizer.cls_token_id
+        sep_id = tokenizer.sep_token_id
+        input_ids = [cls_id] + doc_ids + [sep_id] + sum_ids + [sep_id]
+        attention_mask = [1] * len(input_ids)
+        feat = {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+        }
+        if return_token_type_ids:
+            feat["token_type_ids"] = [0] * (len(doc_ids) + 2) + [1] * (len(sum_ids) + 1)
+        return feat
+@torch.no_grad()
+def predict_scores(document: str, summary: str, model_dir: str = "."):
+    model, cfg = MultiEvalVietSumModel.from_pretrained_local(model_dir)
+    tokenizer = MultiEvalVietSumModel.load_tokenizer_local(model_dir)
+    feat = build_pair_feature(
+        tokenizer,
+        document=document,
+        summary=summary,
+        max_len=cfg["max_len"],
+        summary_max_len=cfg["summary_max_len"],
+    )
+    batch = {
+        "input_ids": torch.tensor([feat["input_ids"]], dtype=torch.long),
+        "attention_mask": torch.tensor([feat["attention_mask"]], dtype=torch.long),
+    }
+    if "token_type_ids" in feat:
+        batch["token_type_ids"] = torch.tensor([feat["token_type_ids"]], dtype=torch.long)
+    scores = model(**batch)[0].cpu().tolist()
+    return {
+        "faithfulness": float(scores[0]),
+        "coherence": float(scores[1]),
+        "relevance": float(scores[2]),
+    }
+if __name__ == "__main__":
+    doc = "Văn bản gốc mẫu."
+    summ = "Bản tóm tắt mẫu."
+    print(predict_scores(doc, summ))

modeling_multievalvietsum.py ADDED Viewed

	@@ -0,0 +1,65 @@

+import json
+from pathlib import Path
+import torch
+import torch.nn as nn
+from transformers import AutoModel, AutoTokenizer
+def mean_pool(last_hidden_state: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
+    mask = attention_mask.unsqueeze(-1).float()
+    masked = last_hidden_state * mask
+    denom = mask.sum(dim=1).clamp(min=1e-6)
+    return masked.sum(dim=1) / denom
+class MultiEvalVietSumModel(nn.Module):
+    def __init__(self, backbone_name: str):
+        super().__init__()
+        self.backbone_name = backbone_name
+        self.model = AutoModel.from_pretrained(backbone_name)
+        hidden = self.model.config.hidden_size
+        self.trunk = nn.Sequential(
+            nn.Linear(hidden * 2, 256),
+            nn.GELU(),
+            nn.Dropout(0.1),
+        )
+        self.head_faith = nn.Linear(256, 1)
+        self.head_coh = nn.Linear(256, 1)
+        self.head_rel = nn.Linear(256, 1)
+    def forward(self, input_ids, attention_mask, token_type_ids=None):
+        kwargs = {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+        }
+        if token_type_ids is not None:
+            kwargs["token_type_ids"] = token_type_ids
+        out = self.model(**kwargs)
+        cls_vec = out.last_hidden_state[:, 0]
+        mean_vec = mean_pool(out.last_hidden_state, attention_mask)
+        pooled = torch.cat([cls_vec, mean_vec], dim=-1)
+        z = self.trunk(pooled)
+        faith = self.head_faith(z)
+        coh = self.head_coh(z)
+        rel = self.head_rel(z)
+        return torch.cat([faith, coh, rel], dim=1)
+    @classmethod
+    def from_pretrained_local(cls, model_dir: str):
+        model_dir = Path(model_dir)
+        with open(model_dir / "multievalvietsum_config.json", "r", encoding="utf-8") as f:
+            cfg = json.load(f)
+        model = cls(backbone_name=cfg["backbone_name"])
+        state_dict = torch.load(model_dir / "pytorch_model.bin", map_location="cpu")
+        model.load_state_dict(state_dict, strict=True)
+        model.eval()
+        return model, cfg
+    @staticmethod
+    def load_tokenizer_local(model_dir: str):
+        return AutoTokenizer.from_pretrained(model_dir, use_fast=True)

multievalvietsum_config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "model_type": "multievalvietsum",
+  "repo_id": "phuongntc/MultiEvalVietSum",
+  "backbone_name": "jhu-clsp/mmBERT-base",
+  "max_len": 2048,
+  "summary_max_len": 192,
+  "pooling": "cls_plus_mean",
+  "outputs": [
+    "faithfulness",
+    "coherence",
+    "relevance"
+  ],
+  "notes": [
+    "Custom evaluator architecture on top of a Hugging Face backbone",
+    "Use modeling_multievalvietsum.py to load the model correctly"
+  ]
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8b38a5508327b1f79a42074c322d9bb628ee56f60600d612c7481c124fb89d1
+size 1229393435

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:609d8f4c067cd3950f88594c5a802616cea245823836ef5848ee4fc40aab5b6f
+size 34363188

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<bos>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<bos>",
+  "eos_token": "<eos>",
+  "extra_special_tokens": [
+    "<start_of_turn>",
+    "<end_of_turn>"
+  ],
+  "is_local": false,
+  "mask_token": "<mask>",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sep_token": "<eos>",
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "<unk>"
+}

training_summary.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "model_name": "MultiEvalVietSum",
+  "repo_id": "phuongntc/MultiEvalVietSum",
+  "backbone": "jhu-clsp/mmBERT-base",
+  "task": "Vietnamese summary evaluation",
+  "architecture": {
+    "type": "cross-encoder evaluator",
+    "pooling": "CLS + mean pooling",
+    "heads": [
+      "faithfulness",
+      "coherence",
+      "relevance"
+    ],
+    "loss": "MSE regression + pairwise margin ranking loss"
+  },
+  "tokenization": {
+    "max_len": 2048,
+    "summary_max_len": 192,
+    "pair_construction": "summary truncated first; remaining token budget prioritized for document"
+  },
+  "reported_metrics": {
+    "validation": {
+      "val_pearson_faith": null,
+      "val_pearson_coh": null,
+      "val_pearson_rel": null,
+      "val_pearson_mean": null,
+      "val_spearman_faith": null,
+      "val_spearman_coh": null,
+      "val_spearman_rel": null,
+      "val_spearman_mean": null
+    }
+  },
+  "intended_use": [
+    "Evaluate Vietnamese summaries with respect to a source document",
+    "Support research on automatic summary evaluation in Vietnamese",
+    "Provide criterion-specific scores for faithfulness, coherence, and relevance"
+  ],
+  "limitations": [
+    "This model is an automatic evaluator, not a text generator",
+    "Scores are proxy judgments and should not replace careful human evaluation in high-stakes settings",
+    "Performance may degrade on out-of-domain data",
+    "The evaluator only sees the truncated input pair defined by MAX_LEN and SUM_MAX_LEN"
+  ],
+  "transparency_notes": [
+    "The model consumes a document-summary pair and outputs three scalar scores",
+    "Users should report exact preprocessing and truncation settings when reproducing experiments",
+    "For long documents, content beyond the token budget is not visible to the evaluator"
+  ],
+  "citation_bibtex": "@misc{phuong2026multievalvietsum,\n  title={MultiEvalVietSum: A Vietnamese Criterion-Specific Evaluator for Summary Assessment},\n  author={Phuong N. T. and collaborators},\n  year={2026},\n  note={Model card and code release on Hugging Face},\n  howpublished={\\url{https://huggingface.co/phuongntc/MultiEvalVietSum}}\n}"
+}