2021
DOI: 10.1186/s40537-021-00492-0
|View full text |Cite
|
Sign up to set email alerts
|

Text Data Augmentation for Deep Learning

Abstract: Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for tex… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
152
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 823 publications
(153 citation statements)
references
References 77 publications
0
152
0
1
Order By: Relevance
“…There are 31,342 data points, with 3,082 security-related ones and 28,260 nonsecurity-related data points. Therefore, data augmentation techniques such as back translation [ 22 ] and easy data augmentation [ 23 ] are used to balance the data set. Since some of the deep learning approaches are explored, basic text cleaning methods are only applied.…”
Section: Datamentioning
confidence: 99%
“…There are 31,342 data points, with 3,082 security-related ones and 28,260 nonsecurity-related data points. Therefore, data augmentation techniques such as back translation [ 22 ] and easy data augmentation [ 23 ] are used to balance the data set. Since some of the deep learning approaches are explored, basic text cleaning methods are only applied.…”
Section: Datamentioning
confidence: 99%
“…Third, using text augmentation increases the accuracy of message classification (model 2, acc = 0.8620 < model 3, acc = 0.8775). Data augmentation is used in most implementations of deep learning algorithms, most commonly as a regularisation strategy to prevent overfitting ( 44 ). In NLP, learning of high-frequency numeric patterns (e.g., token embeddings) or memorisation of particular forms of language prevent generalisation.…”
Section: Resultsmentioning
confidence: 99%
“…Data augmentation is commonly used to enrich the training dataset such that the trained models are robust and produce improved performance for deep learning models, and the technique has been widely used in computer and speech processing [14,15], with interests in textual data augmentation increasing over the last few years [14,36]. As textual communications are inherently more complex (i.e., syntax and semantic constraints), several data augmentation techniques have been proposed by scholars including translations [17,37,38], question answering [39], synonym replacement [16], etc.…”
Section: Data Augmentationmentioning
confidence: 99%
“…2.2, studies exploring deep learning algorithms, particularly those exploring and comparing various embedding techniques are lacking, both for English and non-English datasets [10,12,13]. Moreover, recent reviews show studies exploring data augmentation techniques in supervised deep learning algorithms to improve prediction improvements [14]. The technique, which is generally a regularization technique that synthesizes new data from existing data has been widely used in computing vision [14,15]; however, works relating to textual data is limited due to the difficulty of establishing standard rules for automatic transformations of textual data while conserving the quality of the annotations [14,16,17], except for a few.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation