Instructions to use victormuryn/mpnet-use-combined-no-pt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use victormuryn/mpnet-use-combined-no-pt with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("victormuryn/mpnet-use-combined-no-pt") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use victormuryn/mpnet-use-combined-no-pt with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("victormuryn/mpnet-use-combined-no-pt") model = AutoModel.from_pretrained("victormuryn/mpnet-use-combined-no-pt") - Notebooks
- Google Colab
- Kaggle
mpnet-use-combined-no-pt
This model is a fine-tuned version of paraphrase-multilingual-mpnet-base-v2, trained on the Ukrainian text corpus UberText 2.0 with combined data augmentation strategies but without pool targets. It is part of the Ukrainian Sentence Embeddings collection, which explores the effect of different training strategies on sentence embedding quality for Ukrainian.
Model Description
The model was fine-tuned using a contrastive objective on UberText 2.0, combining multiple data augmentation techniques to compensate for the skewed distribution of polysemous words in the corpus. Compared to mpnet-use-ubertext-no-pt, this variant applies augmentation strategies during training, which improves sense-level distinctions for underrepresented homonyms while keeping pool targets disabled.
Collection Overview
| Model | Description |
|---|---|
| mpnet-use-ubertext-no-pt | Raw UberText 2.0, no augmentation, no pool targets |
| mpnet-use-combined-no-pt (this model) | Combined augmentation strategies, no pool targets |
| mpnet-use-markov-pt | Markov-based augmentation with pool targets |
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("victormuryn/mpnet-use-combined-no-pt")
sentences = [
"Проводжає сина мати захищати рідний край",
"Хоч би малесеньку хатину він мріяв мати над Дніпром",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
Training Details
- Base model: paraphrase-multilingual-mpnet-base-v2
- Training corpus: UberText 2.0
- Augmentation: Combined
- Pool targets: No
Citation
To be added
License
Apache 2.0
- Downloads last month
- 31