2022
DOI: 10.1007/s13042-022-01553-3
|View full text |Cite
|
Sign up to set email alerts
|

Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers

Abstract: In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the perform… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 76 publications
(35 citation statements)
references
References 49 publications
0
34
0
1
Order By: Relevance
“…The challenge with using these models is to make the generations truly label preserving. This is, for example, done by Anaby-Tavor et al (2020), Queiroz Abonizio and Barbon Junior (2020) and Bayer et al (2021). The models are conditioned by fine-tuning on the label-induced training data (or just the class data) and are then tasked to complete a text given the label conditioned beginning (prompt).…”
Section: Data Augmentationmentioning
confidence: 99%
See 3 more Smart Citations
“…The challenge with using these models is to make the generations truly label preserving. This is, for example, done by Anaby-Tavor et al (2020), Queiroz Abonizio and Barbon Junior (2020) and Bayer et al (2021). The models are conditioned by fine-tuning on the label-induced training data (or just the class data) and are then tasked to complete a text given the label conditioned beginning (prompt).…”
Section: Data Augmentationmentioning
confidence: 99%
“…Our data augmentation strategy is the first to explore the generation capabilities of large language models with constraining them through filtering mechanisms. We combine the works of Yoo et al (2021) and Bayer et al (2021) by using GPT-3 with a human-in-the-loop filtering mechanism. We extend the few-shot learning research by proposing a multi-level fine-tuning approach.…”
Section: Research Gapmentioning
confidence: 99%
See 2 more Smart Citations
“…The problem of extrapolating without additional data has been targeted recently by preliminary approaches such as continual learning for non-stationary data or transfer learning, neither of which has offered fully satisfactory solutions (Mitra, 2021). Some successful systems appear to perform as if they had learnt complex and abstract concepts and could potentially transfer learning (e.g., text to image generation (Ramesh et al, 2021), text generation (Bayer et al, 2022;Yang et al, 2021) etc.). This illusion is nurtured by the assumption that human-like performance implies human-like strategies, that is, by the belief that behaving in a manner akin to humans presupposes similar underlying cognitive traits.…”
Section: Introductionmentioning
confidence: 99%