| --- |
| tags: |
| - model |
| - checkpoints |
| - translation |
| - latin |
| - english |
| - mt5 |
| - mistral |
| - multilingual |
| - NLP |
| language: |
| - en |
| - la |
| license: "cc-by-4.0" |
| models: |
| - mistralai/Mistral-7B-Instruct-v0.3 |
| - google/mt5-small |
| model_type: "mt5-small" |
| training_epochs: 6 (initial pipeline), 30 (final pipeline with optimizations), 100 (fine-tuning on 4750 summaries) |
| task_categories: |
| - translation |
| - summarization |
| - multilingual-nlp |
| task_ids: |
| - en-la-translation |
| - la-en-translation |
| - text-generation |
| pretty_name: "mT5-LatinSummarizerModel" |
| storage: |
| - git-lfs |
| - huggingface-models |
| size_categories: |
| - 5GB<n<10GB |
| --- |
| # **mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP** |
|
|
| [](https://github.com/AxelDlv00/LatinSummarizer) |
| [](https://huggingface.co/LatinNLP/LatinSummarizerModel) |
| [](https://huggingface.co/datasets/LatinNLP/LatinSummarizerDataset) |
|
|
| ## **Overview** |
| This repository contains the **trained checkpoints and tokenizer files** for the `mT5-LatinSummarizerModel`, which was fine-tuned to improve **Latin summarization and translation**. It is designed to: |
| - Translate between **English and Latin**. |
| - Summarize Latin texts effectively. |
| - Leverage extractive and abstractive summarization techniques. |
| - Utilize **curriculum learning** for improved training. |
|
|
| ## **Installation & Usage** |
| To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run: |
| ```bash |
| bash install_large_models.sh |
| ``` |
|
|
| ## **Project Structure** |
| ``` |
| . |
| βββ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset) |
| β βββ no_stanza |
| β βββ with_stanza |
| βββ initial_pipeline (Trained for 6 epochs without optimizations) |
| β βββ mt5-small-en-la-translation-epoch5 |
| βββ install_large_models.sh |
| βββ README.md |
| ``` |
|
|
| ## **Training Methodology** |
| We fine-tuned **mT5-small** in three phases: |
| 1. **Initial Training Pipeline (6 epochs)**: Used the full dataset without optimizations. |
| 2. **Final Training Pipeline (30 light epochs)**: Used **10% of training data per epoch** for efficiency. |
| 3. **Fine-Tuning (100 epochs)**: Focused on the **4750 high-quality summaries** for final optimization. |
|
|
| #### **Training Configurations:** |
| - **Hardware:** 16GB VRAM GPU (lab machines via SSH). |
| - **Batch Size:** Adaptive due to GPU memory constraints. |
| - **Gradient Accumulation:** Enabled for larger effective batch sizes. |
| - **LoRA-based fine-tuning:** LoRA Rank 8, Scaling Factor 32. |
| - **Dynamic Sequence Length Adjustment:** Increased progressively. |
| - **Learning Rate:** `5 Γ 10^-4` with warm-up steps. |
| - **Checkpointing:** Frequent saves to mitigate power outages. |
|
|
| ## **Evaluation & Results** |
| We evaluated the model using **ROUGE, BERTScore, and BLEU/chrF scores**. |
|
|
| | Metric | Before Fine-Tuning | After Fine-Tuning | |
| |--------|-----------------|-----------------| |
| | ROUGE-1 | 0.1675 | 0.2541 | |
| | ROUGE-2 | 0.0427 | 0.0773 | |
| | ROUGE-L | 0.1459 | 0.2139 | |
| | BERTScore-F1 | 0.6573 | 0.7140 | |
|
|
| - **chrF Score (enβla):** 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza). |
| - **Summarization Density:** Maintained at ~6%. |
|
|
| ### **Observations:** |
| - Pre-training on **extractive summaries** was crucial. |
| - The model retained some **excessive extraction**, indicating room for further improvement. |
|
|
| ## **License** |
| This model is released under **CC-BY-4.0**. |
|
|
| ## **Citation** |
| ```bibtex |
| @misc{LatinSummarizerModel, |
| author = {Axel Delaval, Elsa Lubek}, |
| title = {Latin-English Summarization Model (mT5)}, |
| year = {2025}, |
| url = {https://huggingface.co/LatinNLP/LatinSummarizerModel} |
| } |
| ``` |