2020
DOI: 10.48550/arxiv.2005.12766
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CERT: Contrastive Self-supervised Learning for Language Understanding

Abstract: Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture sentence-level semantics very well. To address this issue, we propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pretrains language representation models using contrastive selfsupervised learning at the sentence level. CERT creates augmentation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
122
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 100 publications
(122 citation statements)
references
References 28 publications
0
122
0
Order By: Relevance
“…Common approaches consider sentences within the same context as semantically similar samples (Kiros et al, 2015;Logeswaran & Lee, 2018). To create positive training pairs with augmented samples, a diverse set of text augmentation operations have been explored, including lexiconbased distortion (Wei & Zou, 2019), synonym replacement (Kobayashi, 2018), back-translation (Fang & Xie, 2020), cut-off (Shen et al, 2020) and dropout (Gao et al, 2021). However, unsupervised sentence embedding models still perform notably worse than supervised sentence encoders.…”
Section: Related Workmentioning
confidence: 99%
“…Common approaches consider sentences within the same context as semantically similar samples (Kiros et al, 2015;Logeswaran & Lee, 2018). To create positive training pairs with augmented samples, a diverse set of text augmentation operations have been explored, including lexiconbased distortion (Wei & Zou, 2019), synonym replacement (Kobayashi, 2018), back-translation (Fang & Xie, 2020), cut-off (Shen et al, 2020) and dropout (Gao et al, 2021). However, unsupervised sentence embedding models still perform notably worse than supervised sentence encoders.…”
Section: Related Workmentioning
confidence: 99%
“…Constrastive learning was introduced in computer vision by Wu et al (2018), followed by several modifications to improve the training (He et al, 2020;Caron et al, 2020). In the context of natural language processing, Fang et al (2020) proposed to apply MoCo where positive pairs of sentences are obtained using back-translation. Different works augmented the masked language modeling objective with a contrastive loss (Giorgi et al, 2020;Meng et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…For the case of image modality, such tasks include predicting artificial rotations [13], colourisation [41,42] and feature clustering [4]. Recently Contrastive Learning [16] has become increasingly popular for learning both visual [9,19], audio [6,28] and natural language [11] representations. The method is to push positive pairs' embedding closer while pulling negative pairs' embedding further apart.…”
Section: Related Workmentioning
confidence: 99%