Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders
Paper • 2104.08027 • Published
How to use cambridgeltl/mirror-roberta-base-sentence-drophead with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="cambridgeltl/mirror-roberta-base-sentence-drophead") # Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("cambridgeltl/mirror-roberta-base-sentence-drophead")
model = AutoModel.from_pretrained("cambridgeltl/mirror-roberta-base-sentence-drophead")YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
language: en
tags:
An unsupervised sentence encoder proposed by Liu et al. (2021), using drophead instead of dropout as feature space augmentation. The model is trained with unlabelled raw sentences, using roberta-base as the base model. Please use [CLS] (before pooler) as the representation of the input.
Note the model does not replicate the exact numbers in the paper since the reported numbers in the paper are average of three runs.
@inproceedings{
liu2021fast,
title={Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders},
author={Liu, Fangyu and Vuli{\'c}, Ivan and Korhonen, Anna and Collier, Nigel},
booktitle={EMNLP 2021},
year={2021}
}