Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.726
|View full text |Cite
|
Sign up to set email alerts
|

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Abstract: Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data sce… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 52 publications
(27 citation statements)
references
References 29 publications
0
26
0
1
Order By: Relevance
“…Different from this work, we leverage a small labeled dataset using a distillation method to augment training data. Liu et al (2020) proposed a text augmentation method using reinforcement learning to guide conditional text generation. For classification of tweets, Sharifirad et al (2018) conducted data augmentation by using a combination of knowledge graphs to add the related concepts to the original tweets.…”
Section: Related Workmentioning
confidence: 99%
“…Different from this work, we leverage a small labeled dataset using a distillation method to augment training data. Liu et al (2020) proposed a text augmentation method using reinforcement learning to guide conditional text generation. For classification of tweets, Sharifirad et al (2018) conducted data augmentation by using a combination of knowledge graphs to add the related concepts to the original tweets.…”
Section: Related Workmentioning
confidence: 99%
“…To make matters worse, even the label for a specific task is changed in some cases after EDA was applied [19]. There have been cases where the sentence became illegible because the word order of the original text was changed [23].…”
Section: A Text Data Augmentationmentioning
confidence: 99%
“…However, the two methods both entail the risk of the aforementioned problems as transformation is applied to the original text data at the token level. EDA has been found to change the semantic elements of the original input text [22], whereas back-translation has been reported to generate inaccurate text depending on the performance of the translator [17] or the vocabulary of the augmented text is reduced as specific tokens are repeated [23].…”
Section: Introductionmentioning
confidence: 99%
“…Augmentation in NLP Data augmentation for NLP has been studied extensively in the past (Jia and Liang, 2016;Silfverberg et al, 2017;Fürstenau and Lapata, 2009). Common methods include those that alter the surface form text (Wei and Zou, 2019) or perturb a latent embedding space (Wang and Yang, 2015;Fadaee et al, 2017;Liu et al, 2020), as well as those that perform paraphrasing (Zhang et al, 2019). Alternatively, masked language models generate new examples by proposing context-aware replacements for the masked token (Kobayashi, 2018;Wu et al, 2019).…”
Section: Data Augmentationmentioning
confidence: 99%