Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use dostoevskyIdiot/intfloat-e5-large-v2-jaiv-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("dostoevskyIdiot/intfloat-e5-large-v2-jaiv-v2")
sentences = [
"married woman cheats with a low life creep",
"Real Documentary: The Moment When A Divorce Wanting Wife Cheats categorized as Nymphomaniac, Married Woman",
"Hot Married Woman Athlete categorized as Mature Woman, Married Woman, Sports",
"This Married Woman Was Betrayed By Her Friends And Got Creampie Fucked By A Fucking Low Life Creep Yuko Shiraki categorized as Mature Woman, Married Woman, Cheating Wife"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from intfloat/e5-large-v2. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'shizuko yoshinaga first time shots',
"Debut of a MILF AV Actress Document. First Time Shots! Cute Smile and Made-to-Fuck Body on a Mature Woman in her 50's. Shizuko Yoshinaga categorized as Mature Woman, Shaved Pussy, Documentary",
'50 And Filming Her First Creampie Fumie Saito categorized as Mature Woman, Documentary',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4913, 0.1716],
# [0.4913, 1.0000, 0.4514],
# [0.1716, 0.4514, 1.0000]])
testInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.6884 |
| cosine_accuracy@3 | 0.8069 |
| cosine_accuracy@5 | 0.8451 |
| cosine_accuracy@10 | 0.8874 |
| cosine_precision@1 | 0.6884 |
| cosine_precision@3 | 0.269 |
| cosine_precision@5 | 0.169 |
| cosine_precision@10 | 0.0887 |
| cosine_recall@1 | 0.6884 |
| cosine_recall@3 | 0.8069 |
| cosine_recall@5 | 0.8451 |
| cosine_recall@10 | 0.8874 |
| cosine_ndcg@10 | 0.7879 |
| cosine_mrr@10 | 0.756 |
| cosine_map@100 | 0.7597 |
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
mature masseuse stimulates client's throat and pussy |
Remarkable Masseuse's Dick Stimulates Throat And Vagina 12 People 240 Minutes 5 categorized as Mature Woman, Massage |
Mature Woman On The Forefront Of The Sex Industry - Surprisingly Successful Mature Woman Massage Specialist's Seductive Technique! categorized as Mature Woman, Massage |
Threesome featuring Akari Yukino and deep pussy digging |
Akari Yukino In Sweaty, Deep, Pussy Digging Sex categorized as Slender, Shemale, Anal Play, Threesome / Foursome, Facial, Daydreamers |
Super Fuck-a-thon! Cum-a-thon! 24-Hours Akari Hoshino Total Guerrilla SPECIAL!! categorized as Car Sex, Threesome / Foursome |
man massages woman at spa with lotion |
A Male Esthetician In A Women-Only Massage Parlor... categorized as Massage Parlor, Lotion |
New Masseur Came to a Women Only Massage Parlor and Starts Sending Numerous Women To Pleasure Heaven! categorized as Massage Parlor, Voyeur, Massage |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 10num_train_epochs: 2eval_strategy: stepsper_device_eval_batch_size: 10multi_dataset_batch_sampler: round_robinper_device_train_batch_size: 10num_train_epochs: 2max_steps: -1learning_rate: 5e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1label_smoothing_factor: 0.0bf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: stepsper_device_eval_batch_size: 10prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | test_cosine_ndcg@10 |
|---|---|---|---|
| 0.0194 | 500 | 0.8527 | 0.7377 |
| 0.0388 | 1000 | 0.2592 | 0.7422 |
| 0.0582 | 1500 | 0.2065 | 0.7625 |
| 0.0775 | 2000 | 0.1876 | 0.7671 |
| 0.0969 | 2500 | 0.1892 | 0.7511 |
| 0.1163 | 3000 | 0.1812 | 0.7567 |
| 0.1357 | 3500 | 0.1890 | 0.7485 |
| 0.1551 | 4000 | 0.1898 | 0.7425 |
| 0.1745 | 4500 | 0.1809 | 0.7477 |
| 0.1939 | 5000 | 0.1920 | 0.7497 |
| 0.2133 | 5500 | 0.1877 | 0.7347 |
| 0.2326 | 6000 | 0.1953 | 0.7270 |
| 0.2520 | 6500 | 0.1907 | 0.7327 |
| 0.2714 | 7000 | 0.1912 | 0.7154 |
| 0.2908 | 7500 | 0.1882 | 0.7113 |
| 0.3102 | 8000 | 0.1912 | 0.7404 |
| 0.3296 | 8500 | 0.1884 | 0.7199 |
| 0.3490 | 9000 | 0.1761 | 0.7309 |
| 0.3684 | 9500 | 0.1864 | 0.7406 |
| 0.3877 | 10000 | 0.1805 | 0.7239 |
| 0.4071 | 10500 | 0.1695 | 0.7383 |
| 0.4265 | 11000 | 0.1814 | 0.7413 |
| 0.4459 | 11500 | 0.1689 | 0.7406 |
| 0.4653 | 12000 | 0.1607 | 0.7475 |
| 0.4847 | 12500 | 0.1654 | 0.7337 |
| 0.5041 | 13000 | 0.1714 | 0.7442 |
| 0.5235 | 13500 | 0.1639 | 0.7409 |
| 0.5428 | 14000 | 0.1560 | 0.7311 |
| 0.5622 | 14500 | 0.1521 | 0.7238 |
| 0.5816 | 15000 | 0.1665 | 0.7395 |
| 0.6010 | 15500 | 0.1686 | 0.7399 |
| 0.6204 | 16000 | 0.1619 | 0.7496 |
| 0.6398 | 16500 | 0.1593 | 0.7337 |
| 0.6592 | 17000 | 0.1604 | 0.7567 |
| 0.6786 | 17500 | 0.1646 | 0.7464 |
| 0.6979 | 18000 | 0.1597 | 0.7482 |
| 0.7173 | 18500 | 0.1590 | 0.7491 |
| 0.7367 | 19000 | 0.1526 | 0.7201 |
| 0.7561 | 19500 | 0.1591 | 0.7542 |
| 0.7755 | 20000 | 0.1465 | 0.7456 |
| 0.7949 | 20500 | 0.1580 | 0.7556 |
| 0.8143 | 21000 | 0.1474 | 0.7511 |
| 0.8337 | 21500 | 0.1443 | 0.7564 |
| 0.8530 | 22000 | 0.1396 | 0.7580 |
| 0.8724 | 22500 | 0.1419 | 0.7555 |
| 0.8918 | 23000 | 0.1414 | 0.7615 |
| 0.9112 | 23500 | 0.1386 | 0.7542 |
| 0.9306 | 24000 | 0.1532 | 0.7540 |
| 0.9500 | 24500 | 0.1469 | 0.7664 |
| 0.9694 | 25000 | 0.1476 | 0.7549 |
| 0.9888 | 25500 | 0.1441 | 0.7629 |
| 1.0 | 25790 | - | 0.7567 |
| 1.0081 | 26000 | 0.1210 | 0.7555 |
| 1.0275 | 26500 | 0.0976 | 0.7619 |
| 1.0469 | 27000 | 0.1011 | 0.7697 |
| 1.0663 | 27500 | 0.0989 | 0.7639 |
| 1.0857 | 28000 | 0.0917 | 0.7632 |
| 1.1051 | 28500 | 0.0971 | 0.7646 |
| 1.1245 | 29000 | 0.0958 | 0.7615 |
| 1.1439 | 29500 | 0.1000 | 0.7619 |
| 1.1632 | 30000 | 0.0932 | 0.7620 |
| 1.1826 | 30500 | 0.0966 | 0.7608 |
| 1.2020 | 31000 | 0.0922 | 0.7505 |
| 1.2214 | 31500 | 0.0903 | 0.7722 |
| 1.2408 | 32000 | 0.0997 | 0.7689 |
| 1.2602 | 32500 | 0.0818 | 0.7684 |
| 1.2796 | 33000 | 0.0926 | 0.7651 |
| 1.2990 | 33500 | 0.1002 | 0.7737 |
| 1.3183 | 34000 | 0.0893 | 0.7684 |
| 1.3377 | 34500 | 0.0945 | 0.7690 |
| 1.3571 | 35000 | 0.0855 | 0.7761 |
| 1.3765 | 35500 | 0.0918 | 0.7725 |
| 1.3959 | 36000 | 0.0982 | 0.7767 |
| 1.4153 | 36500 | 0.0854 | 0.7685 |
| 1.4347 | 37000 | 0.0883 | 0.7718 |
| 1.4541 | 37500 | 0.0921 | 0.7681 |
| 1.4734 | 38000 | 0.0912 | 0.7763 |
| 1.4928 | 38500 | 0.0908 | 0.7716 |
| 1.5122 | 39000 | 0.0891 | 0.7772 |
| 1.5316 | 39500 | 0.0912 | 0.7757 |
| 1.5510 | 40000 | 0.0811 | 0.7762 |
| 1.5704 | 40500 | 0.0833 | 0.7725 |
| 1.5898 | 41000 | 0.0830 | 0.7800 |
| 1.6092 | 41500 | 0.0874 | 0.7787 |
| 1.6285 | 42000 | 0.0890 | 0.7837 |
| 1.6479 | 42500 | 0.0822 | 0.7754 |
| 1.6673 | 43000 | 0.0805 | 0.7800 |
| 1.6867 | 43500 | 0.0799 | 0.7788 |
| 1.7061 | 44000 | 0.0898 | 0.7838 |
| 1.7255 | 44500 | 0.0816 | 0.7813 |
| 1.7449 | 45000 | 0.0817 | 0.7826 |
| 1.7642 | 45500 | 0.0779 | 0.7824 |
| 1.7836 | 46000 | 0.0846 | 0.7815 |
| 1.8030 | 46500 | 0.0804 | 0.7847 |
| 1.8224 | 47000 | 0.0773 | 0.7808 |
| 1.8418 | 47500 | 0.0767 | 0.7813 |
| 1.8612 | 48000 | 0.0898 | 0.7855 |
| 1.8806 | 48500 | 0.0857 | 0.7854 |
| 1.9000 | 49000 | 0.0837 | 0.7844 |
| 1.9193 | 49500 | 0.0834 | 0.7827 |
| 1.9387 | 50000 | 0.0753 | 0.7861 |
| 1.9581 | 50500 | 0.0880 | 0.7867 |
| 1.9775 | 51000 | 0.0847 | 0.7879 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
Base model
intfloat/e5-large-v2