TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published
How to use jameswright/ws-wr-questions-bert-TSDAE-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jameswright/ws-wr-questions-bert-TSDAE-v1")
sentences = [
"it got and spoke was amazing and me sequi a month . I the been seen a in, I then started, im last one 4 ive my anxiety has horrible I have had a read of different and part hightened, didnt stright away the only just taised head the last . to it all than was prior to but im just for similar experiences me I want or? taking time read.",
"In anycase it got too much and I spoke to my gp who was amazing and started me on evorel sequi for a 3 month trial. So I would say the first 4 patches have been good, I definitely seen a change in myself, I then started the conti patches, im currently on the last one of those 4, ive been brilliant until 3 days ago my anxiety has come back and its horrible. SO I have had a read of some different forums and it seems that the conti part can cause hightened anxiety. now, this didnt happen stright away and the anxiety has only just taised it head again in the last 3 days. So I dont want to complain too much as it is all better than it was prior to HRT but I guess im just looking for people with similar experiences to me, and I want to know if it got better or what? Thanks for taking the time to read. ",
"Just out of interest too... In an untreated Type 1 experiencing symptoms such as hunger, what happens to blood sugar levels. Do they just rise and rise? For example, given a typical day of carb laden breakfast, lunch and dinner and snacks, would you expect the BG levels to just go out of control over a short time (mmol/l into the teens and beyond? )....",
"Anyone else have post natal anxiety or depression? I’m on the wait list for counselling and my gp has prescribed tablets (although I’m not going to take them). Just wondering if anyone else is in the same boat and how you are coping? I’m so up and down x 😢"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jameswright/ws-wr-questions-bert-TSDAE-v1")
# Run inference
sentences = [
'You have been sick so If questions it just - ’ being sorted, thanks Then assertiveness course and a . overtime isn selfish it just doing ’ s right for Why you?',
'You have been off sick so it makes sense. If anyone questions it just grey rock - it’s being sorted, thanks. Then do an online assertiveness course and ask your GP for a CBT referral. Not doing overtime isn’t selfish - it’s just you doing what’s right for you. Why would you do anything else?',
'Science works by the accumulation of evidence.\xa0 Independent groups work on projects and publish results.\xa0 Those results are examined and tested and examined again and tested again, such that they\'re either confirmed or discarded and further work continues accordingly.\xa0 If a scientist or doctor disagrees with the \'official line\' they\'re asked to present the data, methods and conclusions that have led to that disagreement so that it can by examined by the broader scientific and medical community.\xa0 \xa0 And yes, someone who goes on YouTube or wherever - whether doctor, scientist or layperson - and tells viewers that a vaccine alters DNA structure and destroys the immune system is either a grifter or a fruitcake. 1 minute ago, FIRETHORN1 said: ...Can you not accept that some people can hold an opposite view quite genuinely? To me, a "conspiracy theorist" is someone who believes what they are told, without any evidence to back it up. I can absolutely accept that someone can genuinely believe something without having any evidence at all to support that view.\xa0 I\'m believing that right now, in fact. 1 minute ago, FIRETHORN1 said: There is no evidence whatsoever that the vaccine works That is categorically, absolutely and undeniably false, as the most cursory of research will tell you.\xa0 But then you don\'t actually want\xa0 to believe me, do you?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
’ Can really go to the doctors I ’ bored of the ” . Feels more like than a doctor, does sound depression, so seeing GP a first |
PetersRabbitt I don’t know? Can I really go to the doctors and say “hey, yes my problem is I’m bored all of the time”. Feels more like a me problem than one a doctor can help with. Yes, absolutely. It does sound like it could be depression, so seeing your GP is a good first step. |
Ursuladevine Between 11 16, if hasn t, what has been been providing education have LakieLady Yesterday 15:34 My that dismissed offhand son the school have up for assessment . Within years referred diagnosed with PTSD,,, social anxiety and, decided his be. |
Ursuladevine · Yesterday 15:42 Between 11 and 16, if he hasn’t been attending school, what has he been doing? Has the LA been providing any education or have you been HE? LakieLady · Yesterday 15:34 My friend tried that, and the GP dismissed it offhand, saying that if her son was neurodivergent, the school would have picked up on it and referred him for assessment. Her DS was eight at the time. Within the next 2-3 years, he got much worse, was referred to CAMHS, diagnosed with significant MH problems (PTSD, GAD, depression, social anxiety disorder) and after a couple of years, CAMHS decided his mother might not be talking bollocks and that he might have ASD. |
It sounds you were a child then came along realised here was he - and this to it . young, I'd imagine |
It sounds like you were hurt by one man when you were a child, then another came along and realised here was someone damaged he could dominate - and added his own abuse. They can sniff this out and are attracted to it. How old were you when he arrived? Very young, I'd imagine. Stepfather? |
DenoisingAutoEncoderLossnum_train_epochs: 5multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.0301 | 500 | 4.7687 |
| 0.0603 | 1000 | 4.2523 |
| 0.0904 | 1500 | 4.1156 |
| 0.1206 | 2000 | 4.0278 |
| 0.1507 | 2500 | 3.9652 |
| 0.1808 | 3000 | 3.919 |
| 0.2110 | 3500 | 3.8629 |
| 0.2411 | 4000 | 3.7985 |
| 0.2713 | 4500 | 3.7625 |
| 0.3014 | 5000 | 3.7523 |
| 0.3315 | 5500 | 3.7316 |
| 0.3617 | 6000 | 3.6837 |
| 0.3918 | 6500 | 3.669 |
| 0.4220 | 7000 | 3.6394 |
| 0.4521 | 7500 | 3.6017 |
| 0.4822 | 8000 | 3.5693 |
| 0.5124 | 8500 | 3.5821 |
| 0.5425 | 9000 | 3.5488 |
| 0.5727 | 9500 | 3.5139 |
| 0.6028 | 10000 | 3.5119 |
| 0.6329 | 10500 | 3.4988 |
| 0.6631 | 11000 | 3.4741 |
| 0.6932 | 11500 | 3.4719 |
| 0.7234 | 12000 | 3.4501 |
| 0.7535 | 12500 | 3.4353 |
| 0.7837 | 13000 | 3.4107 |
| 0.8138 | 13500 | 3.4023 |
| 0.8439 | 14000 | 3.3902 |
| 0.8741 | 14500 | 3.3697 |
| 0.9042 | 15000 | 3.3731 |
| 0.9344 | 15500 | 3.3603 |
| 0.9645 | 16000 | 3.3284 |
| 0.9946 | 16500 | 3.3339 |
| 1.0248 | 17000 | 3.2793 |
| 1.0549 | 17500 | 3.2098 |
| 1.0851 | 18000 | 3.1994 |
| 1.1152 | 18500 | 3.1801 |
| 1.1453 | 19000 | 3.1634 |
| 1.1755 | 19500 | 3.1566 |
| 1.2056 | 20000 | 3.1205 |
| 1.2358 | 20500 | 3.1064 |
| 1.2659 | 21000 | 3.1028 |
| 1.2960 | 21500 | 3.099 |
| 1.3262 | 22000 | 3.1028 |
| 1.3563 | 22500 | 3.0653 |
| 1.3865 | 23000 | 3.044 |
| 1.4166 | 23500 | 3.0481 |
| 1.4467 | 24000 | 3.0133 |
| 1.4769 | 24500 | 2.9667 |
| 1.5070 | 25000 | 3.0226 |
| 1.5372 | 25500 | 2.991 |
| 1.5673 | 26000 | 2.9593 |
| 1.5974 | 26500 | 2.9598 |
| 1.6276 | 27000 | 2.9572 |
| 1.6577 | 27500 | 2.9579 |
| 1.6879 | 28000 | 2.9303 |
| 1.7180 | 28500 | 2.948 |
| 1.7481 | 29000 | 2.918 |
| 1.7783 | 29500 | 2.9014 |
| 1.8084 | 30000 | 2.8948 |
| 1.8386 | 30500 | 2.8916 |
| 1.8687 | 31000 | 2.8787 |
| 1.8988 | 31500 | 2.8864 |
| 1.9290 | 32000 | 2.8649 |
| 1.9591 | 32500 | 2.8419 |
| 1.9893 | 33000 | 2.8688 |
| 2.0194 | 33500 | 2.8329 |
| 2.0496 | 34000 | 2.7442 |
| 2.0797 | 34500 | 2.7501 |
| 2.1098 | 35000 | 2.7466 |
| 2.1400 | 35500 | 2.7343 |
| 2.1701 | 36000 | 2.7014 |
| 2.2003 | 36500 | 2.6891 |
| 2.2304 | 37000 | 2.6819 |
| 2.2605 | 37500 | 2.6779 |
| 2.2907 | 38000 | 2.6872 |
| 2.3208 | 38500 | 2.6758 |
| 2.3510 | 39000 | 2.6665 |
| 2.3811 | 39500 | 2.6392 |
| 2.4112 | 40000 | 2.6362 |
| 2.4414 | 40500 | 2.6038 |
| 2.4715 | 41000 | 2.5535 |
| 2.5017 | 41500 | 2.6081 |
| 2.5318 | 42000 | 2.6071 |
| 2.5619 | 42500 | 2.5571 |
| 2.5921 | 43000 | 2.5774 |
| 2.6222 | 43500 | 2.5556 |
| 2.6524 | 44000 | 2.5683 |
| 2.6825 | 44500 | 2.5317 |
| 2.7126 | 45000 | 2.5509 |
| 2.7428 | 45500 | 2.5292 |
| 2.7729 | 46000 | 2.52 |
| 2.8031 | 46500 | 2.4818 |
| 2.8332 | 47000 | 2.5258 |
| 2.8633 | 47500 | 2.482 |
| 2.8935 | 48000 | 2.5038 |
| 2.9236 | 48500 | 2.4864 |
| 2.9538 | 49000 | 2.4591 |
| 2.9839 | 49500 | 2.4887 |
| 3.0140 | 50000 | 2.4635 |
| 3.0442 | 50500 | 2.3837 |
| 3.0743 | 51000 | 2.3886 |
| 3.1045 | 51500 | 2.3836 |
| 3.1346 | 52000 | 2.38 |
| 3.1647 | 52500 | 2.3456 |
| 3.1949 | 53000 | 2.3171 |
| 3.2250 | 53500 | 2.3341 |
| 3.2552 | 54000 | 2.3228 |
| 3.2853 | 54500 | 2.3459 |
| 3.3154 | 55000 | 2.3251 |
| 3.3456 | 55500 | 2.3365 |
| 3.3757 | 56000 | 2.2838 |
| 3.4059 | 56500 | 2.3042 |
| 3.4360 | 57000 | 2.2465 |
| 3.4662 | 57500 | 2.2304 |
| 3.4963 | 58000 | 2.251 |
| 3.5264 | 58500 | 2.2727 |
| 3.5566 | 59000 | 2.2324 |
| 3.5867 | 59500 | 2.2325 |
| 3.6169 | 60000 | 2.2246 |
| 3.6470 | 60500 | 2.2287 |
| 3.6771 | 61000 | 2.2067 |
| 3.7073 | 61500 | 2.2206 |
| 3.7374 | 62000 | 2.1882 |
| 3.7676 | 62500 | 2.1889 |
| 3.7977 | 63000 | 2.1559 |
| 3.8278 | 63500 | 2.2021 |
| 3.8580 | 64000 | 2.1643 |
| 3.8881 | 64500 | 2.145 |
| 3.9183 | 65000 | 2.1707 |
| 3.9484 | 65500 | 2.1349 |
| 3.9785 | 66000 | 2.1659 |
| 4.0087 | 66500 | 2.152 |
| 4.0388 | 67000 | 2.0801 |
| 4.0690 | 67500 | 2.0729 |
| 4.0991 | 68000 | 2.0676 |
| 4.1292 | 68500 | 2.0622 |
| 4.1594 | 69000 | 2.0376 |
| 4.1895 | 69500 | 2.027 |
| 4.2197 | 70000 | 2.0227 |
| 4.2498 | 70500 | 2.0146 |
| 4.2799 | 71000 | 2.0334 |
| 4.3101 | 71500 | 2.0428 |
| 4.3402 | 72000 | 2.034 |
| 4.3704 | 72500 | 1.9907 |
| 4.4005 | 73000 | 2.0106 |
| 4.4306 | 73500 | 1.9488 |
| 4.4608 | 74000 | 1.961 |
| 4.4909 | 74500 | 1.9351 |
| 4.5211 | 75000 | 1.9875 |
| 4.5512 | 75500 | 1.9454 |
| 4.5813 | 76000 | 1.9453 |
| 4.6115 | 76500 | 1.9239 |
| 4.6416 | 77000 | 1.9664 |
| 4.6718 | 77500 | 1.906 |
| 4.7019 | 78000 | 1.9256 |
| 4.7321 | 78500 | 1.9071 |
| 4.7622 | 79000 | 1.9117 |
| 4.7923 | 79500 | 1.8817 |
| 4.8225 | 80000 | 1.9101 |
| 4.8526 | 80500 | 1.8872 |
| 4.8828 | 81000 | 1.8634 |
| 4.9129 | 81500 | 1.8791 |
| 4.9430 | 82000 | 1.8801 |
| 4.9732 | 82500 | 1.8586 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@inproceedings{wang-2021-TSDAE,
title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
pages = "671--688",
url = "https://arxiv.org/abs/2104.06979",
}
Base model
google-bert/bert-base-uncased