TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published
How to use ar9av/tsdae-civic-bert with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("ar9av/tsdae-civic-bert")
sentences = [
". the land and management meeting of 5th is 7:01 the order Um, is roll . we have us, Nicole, and, . Uh, the is the of of October, . Do have,, are there any, All right,, all favor approving your that is Um to participation Is would like see Miss> hear>> Yes,>>,>, so I'm sorry uh echoing? we . We hear little echo, but we you . Uh f first all, record, Diane Uh",
"All right. Good evening and welcome to\nthe land use and building management\ncommittee meeting of November 5th, 2025. It is 7:01 and I am calling the meeting\nto order. Um, first item is roll call. Tonight we have with us council members\nJim Fyer, Nicole Edy, Nicole Ays, and\nmyself, Barbara Smith. Uh, next item on the agenda is the\nacceptance of the minutes of our meeting\nof October 1, 2025. Do I have a motion,\nMiss Edy, um, are there any corrections,\nAll right, seeing none, all in favor of\napproving the minutes, please raise your\nOkay, that is unanimous. Um, moving on to public participation. Is there anyone\nhere who would like to speak? I see Miss\n>> Yes, good. Can you hear me? >> Yes, we can. >> Yes, we can. >> I am in a hallway, so I'm sorry for any\nuh echoing. I hope it's Can you still\nunderstand me? >> Yes, we can. We hear a little echo, but\nwe can still hear you. Uh anyway, f first of all, good evening. For the record, my name is Diane\nLaurisella. Uh",
"I'll now call to order the November 6, 2025 meeting of Mayor and Council. I'm going to call on the Mayor Pro Tem Andy Gibbs to lead us in prayer. Please remain standing for the pledge. Body heads, please. Dear me, Father, just thank you for this evening. Thank you for bringing us all together. Lord, I just ask you that as we look at the different things on our agenda tonight, that you help us to make the best decision for our community to move to move our community forward. Lord, I just ask you, Lord, to continue to put your arms around us for protection. To put your arms around us, your protection. I thank you, Lord, for sending your son to die on the cross for our sins, Lord. I thank you for the things that you've given us in our life, the mercy of the grace that you've extended to us each and every day to allow us to be able to wake up and walk among other people and help us to serve others as you want us to serve, Lord. And I thank you for everything you do for us. Your name, I pra",
"In the NOSERT room today, the link can be found on the Town of Orleans website. And calling the meeting to order. Any agenda changes? None this evening, this afternoon, sorry. Is there anyone for citizens speak? No one additional online. Okay, seeing none, we're going to go into priority business and we are going to start with our assistant superintendent. Wonderful. Thank you very much. Thank you. So I'm here today to share two things, our homeschooling process and then just an update on our grant. We can't hear you. Yeah. So sorry to enter. I was just going to interrupt. Just make sure that you pull this mic up and for everybody. Is this better? Okay, Parker. Sorry about that. Sorry. So I'm here today. Thank you for that. Just to talk about our homeschooling process and our grant process as well. So I'll go over our homeschooling process. I know of detailed information in the memo and I've included some links to our forms so you'll be able to see it and I'll just give a high-level ov"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ar9av/tsdae-civic-bert")
# Run inference
sentences = [
"right o'clock Oh Jack here . Man brought team 3rd November . going'll to . We Pledge of and . And we got guests Manager Brantley'll of . the . I appeal flag the United States of and to Republic which,, for . Let . God, come before this grateful live wonderful community We serve our community members in our respective capacities morning and give us eyes and full hearts, Lord, as handle business Thank you your, In Christ name amen Thank Price the pledge . First item is",
"All right, it's nine o'clock. Oh, Jack's here too. Man, y'all brought the whole team. 9 o'clock, November 3rd, November already. Gosh. Time keeps on going. We'll call the meeting to order. We'll open with the Pledge of Allegiance and prayer. And since we've got special guests, I'll ask Town Manager Brantley Price if he'll lead us in the Pledge of Allegiance. And I'll say the prayer. I appeal to the flag of the United States of America and to the Republic for which it stands, one nation under God, indivisible, with liberty and justice for all. Let's pray together. God, we come before you this morning with grateful hearts to live in such a wonderful community. We're grateful for the opportunity to serve our community members in our respective capacities here this morning. Be with us and give us clear eyes and full hearts, Lord, as we handle the people's business. Thank you for your many, many blessings. In Christ's name, amen. Amen. Thank you, Manager Price, for the pledge. First item is",
"for Wednesday, November 12th\nat 7:05 p.m. What you want? This meeting is being recorded. report anything. I have\nYeah. agenda\nbut there were other items\n>> that didn't get to this print Right. >> Caught you off guard there. >> I know you did. I'm not. Oh, so it\nworked very well. >> what did what did we end up\nbecause of the day? >> that's all right. You're fine. >> All right. So, we're still going to go\na wedding for\nI think we're gonna\nwant to go. All right. No problem. 1.7. First one is Parker Hill\nWell, it's already been\nConservation Commission. Yeah. And since it's not supposed to be\nwhat snacks,\n>> By the way, I I did autographs. put your\nyour name on your agenda and stuff and\nYou mind just announcing\nthat you've arrived and what time? Did provide? Guess we should wait. kind of that was\n>> right. So what doesn't go into the\ngeneral account instead\nof meeting\nI thought we had a\nCan you read it off? Oh, you\nYou just repeat the number.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8891, 0.8695],
# [0.8891, 1.0000, 0.8283],
# [0.8695, 0.8283, 1.0000]])
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
right I call this town council to order on Recognize is a quorum all those who attendance in those are in online . hello Davis are tonight . Thanks, I We having you . and expeditious tonight a big game JMU at . like have just a wo we've had two works who their a of for Our are with them time Ms. we have?? .? Here?? Here . Mr. Hunter? Here Hardy |
All right, I'd like to call this town council meeting to order on this November 12, 2025. Recognize there is a quorum present. Welcome all those who are in attendance in person and those who are tuned in online. A special hello to Dr. Davis's local and state government class students who are here tonight. Thanks for coming, although I think you were probably told you had to. We appreciate having you. We'll try and be expeditious in the people's business tonight because I know there's a big game later this evening between Longwood and JMU here at home. I'd also like to have just a moment. I won't mention names, but we've had two of our public works employees who lost their wives here very recently. And so we'll just have a brief moment of silence for them. Thank you. Our prayers and our hearts are with them at this difficult time. Ms. McKay, can we have a roll call, please? Mrs. Amos? Here. Mr. Reed? Here. Mr. Dwyer? Here. Mr. Parrott? Here. Mr. Yoland. Here. Mr. Hunter? Here. Mr. Hardy |
everyone November the City Columbia Board of Chair for meeting like introduce the members: Harding'm, Davis Whittle, Sidney Bang, Duvall also to introduce the staff the, Andrew, Board, Erica Hyan, Deputy and Madeline Land . is special, appeals . for the record, wishing and come the . No can floor When come the podium state your and speak the because meeting is recorded Applicants the board |
Welcome, everyone, to the November meeting of the City of Columbia Board of Zoning Appeals. I am Catherine Fenner, Chair for the Board, and will be serving as the chair for today's meeting. I would like to introduce the other members of the board: Josh Harding, I'm sorry, Davis Whittle, Sidney Lanham, Jonathan Bang, and Sherard Duvall. I would also like to introduce the staff that assists the board, Andrew Livingood, Zoning Board Administrator, Erica Hyan, Deputy Zoning Administrator, and Madeline Bowden, Land Use Board Coordinator. The board is charged with hearing applications for special exceptions, variances, and administrative appeals. All testimony is recorded for the record, and anyone wishing to speak will need to be sworn in and come to the podium to speak. No testimony can be taken from the floor. When you come to the podium, state your name and please speak clearly into the microphone because this meeting is being recorded. Applicants with cases before the board are allotted |
Corporation 7:30 WEDC Room 250 Highway Texas TO & PLEDGE OF ON ITEMS member the may Board not Agenda of the fill out a form meeting . that comments limited minutes for, six In addition, is not allowed to converse deliberate take on any presented CONSENT AGENDA matters Agenda are be routine by Board will be will not of items . discussion desired, that item will removed from the separately . act upon |
Wylie Economic Development Corporation |
DenoisingAutoEncoderLossnum_train_epochs: 1multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.4 | 500 | 4.3532 |
| 0.8 | 1000 | 3.4547 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@inproceedings{wang-2021-TSDAE,
title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
pages = "671--688",
url = "https://arxiv.org/abs/2104.06979",
}
Base model
google-bert/bert-base-uncased