Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.109
|View full text |Cite
|
Sign up to set email alerts
|

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Abstract: Previous work has indicated that pretrained Masked Language Models (MLMs) are not effective as universal lexical and sentence encoders off-the-shelf, i.e., without further taskspecific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective lexical and sentence encoders even without any additional data, relying simply on self-supervision. We propose an extremely simple, fast, and effective contrastive… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 51 publications
(49 citation statements)
references
References 54 publications
2
47
0
Order By: Relevance
“…While SEMB and AOC variants exhibit similar performance, ISO variants perform much worse. The direct comparison between ISO and AOC demonstrates the importance of contextual information and seemingly limited usability of off-the-shelf multilingual encoders as word encoders, if no context is available, and if they are not further specialized to encode word-level information (Liu et al 2021). Similarity-specialized multilingual encoders, which rely on pretraining with parallel data, yield mixed results.…”
Section: Document-level Clir Resultsmentioning
confidence: 99%
“…While SEMB and AOC variants exhibit similar performance, ISO variants perform much worse. The direct comparison between ISO and AOC demonstrates the importance of contextual information and seemingly limited usability of off-the-shelf multilingual encoders as word encoders, if no context is available, and if they are not further specialized to encode word-level information (Liu et al 2021). Similarity-specialized multilingual encoders, which rely on pretraining with parallel data, yield mixed results.…”
Section: Document-level Clir Resultsmentioning
confidence: 99%
“…Inference Configuration. For similarity-based inference, as in previous work (Liu et al, 2021a) the Mirror-BERT procedure relies on the 10k most frequent English words for contrastive learning. 5 For constrained beam search, used with the LP task, we set the hyperparameter K to 50.…”
Section: Methodsmentioning
confidence: 99%
“…Based on prior findings concerning multilingual PLMs (Liu et al, 2021b) and our own preliminary experiments, out-of-the-box Prix-LM produces entity embeddings of insufficient quality. However, we can transform them into entity encoders via a simple and efficient unsupervised Mirror-BERT procedure (Liu et al, 2021a). In short, Mirror-BERT is a contrastive learning method that calibrates PLMs and converts them into strong universal lexical or sentence encoders.…”
Section: Inferencementioning
confidence: 99%
See 1 more Smart Citation
“…Unlike prior work which conducted large-scale conversational pretraining from scratch using large datasets, we demonstrate that full pretraining is not needed to obtain universal conversational encoders. By leveraging the general semantic knowledge already stored in pretrained LMs, we can expose (i.e., 'rewire') that knowledge Gao et al, 2021b;Liu et al, 2021b) via much cheaper and quicker adaptive fine-tuning on a tiny fraction of the full Reddit data (e.g., even using < 0.01% of the Reddit corpus). Further, the task-oriented S2 CONVFIT-ing transforms pretrained LMs into task-specialised sentence encoders.…”
Section: Introductionmentioning
confidence: 99%