2020
DOI: 10.48550/arxiv.2010.08670
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Abstract: Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization objective is introduce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 32 publications
0
8
0
Order By: Relevance
“…Because changing a single word can invert the meaning of the whole sentence, and the challenge is compounded when applied a second round data augmentation to short text data. However, we would hopefully cross the hurdle soon, as more effective approaches are keeping developed by the NLP community Qu et al (2020);Giorgi et al (2020); Meng et al (2021).…”
Section: Discussionmentioning
confidence: 99%
“…Because changing a single word can invert the meaning of the whole sentence, and the challenge is compounded when applied a second round data augmentation to short text data. However, we would hopefully cross the hurdle soon, as more effective approaches are keeping developed by the NLP community Qu et al (2020);Giorgi et al (2020); Meng et al (2021).…”
Section: Discussionmentioning
confidence: 99%
“…As the first step to leverage contrastive learning in sequence level pretraining, we keep everything straightforward: A simple cropping as data augmentation and the default temperature 1 is used in the softmax. Advanced data transformations (Qu et al, 2020) and hyperparameter explorations (Oord et al, 2018;Chen et al, 2020) may further improve COCO-LM but are reserved for future work.…”
Section: Sequence Contrastive Learningmentioning
confidence: 99%
“…Fang et al (2020) proposed to use back-translation to construct positive sentence pairs in their contrastive learning framework. ; Qu et al (2020) proposed multiple sentence-level augmentations strategies to do sentence contrastive learning. Most of these work still focus on either local token-level tasks or short sentence-level tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Contrastive Learning offers a simple method to learn disentangled representation that encodes invariance to small and local changes in the input data without using any labeled data. In NLP domain, contrastive learning has been employed to learn sentence representation Qu et al, 2020) under either self-supervised or supervised settings.…”
Section: Introductionmentioning
confidence: 99%