Proceedings of the 4th Workshop on NLP for Conversational AI 2022
DOI: 10.18653/v1/2022.nlp4convai-1.5
|View full text |Cite
|
Sign up to set email alerts
|

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

Abstract: Data augmentation is a widely employed technique to alleviate the problem of data scarcity. In this work, we propose a prompting-based approach to generate labelled training data for intent classification with off-the-shelf language models (LMs) such as GPT-3. An advantage of this method is that no task-specific LM-fine-tuning for data generation is required; hence the method requires no hyper-parameter tuning and is applicable even when the available training data is very scarce. We evaluate the proposed meth… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(11 citation statements)
references
References 38 publications
0
11
0
Order By: Relevance
“…Previous work has used language models to generate synthetic data to increase the amount of available data using pretrained models (Kumar et al, 2020). Some examples of downstream tasks are text classification , intent classification (Sahu et al, 2022), toxic language detection (Hartvigsen et al, 2022), text mining (Tang et al, 2023), or mathematical reasoning (Liu et al, 2023b), inter alia. Synthetic data is also used to pretrain and distill language models.…”
Section: Natural Language Annotation and Data Generation Using Llmsmentioning
confidence: 99%
“…Previous work has used language models to generate synthetic data to increase the amount of available data using pretrained models (Kumar et al, 2020). Some examples of downstream tasks are text classification , intent classification (Sahu et al, 2022), toxic language detection (Hartvigsen et al, 2022), text mining (Tang et al, 2023), or mathematical reasoning (Liu et al, 2023b), inter alia. Synthetic data is also used to pretrain and distill language models.…”
Section: Natural Language Annotation and Data Generation Using Llmsmentioning
confidence: 99%
“…In the field of intent detection, previous work has proposed using data augmentation techniques to generate synthetic training data (Sahu et al, 2022;. Sahu et al (2022) also used PLMs to generate augmented examples, but they require human effort for labeling. This is a challenging task since it is expensive to annotate large amounts of data.…”
Section: Related Workmentioning
confidence: 99%
“…Following Sahu et al (2022), we wanted to see if it is effective to use the available data to train an intent classifier and then use it to relabel the synthetic data. Intuitively, such a method would correct mistakes in the generation process.…”
Section: Data Relabellingmentioning
confidence: 99%
“…Moreover, data augmentation techniques were utilized in [20] to enhance the robustness of NLP models. Furthermore, [36] employs ChatGPT to generate new data, showcasing another innovative application of data augmentation techniques in NLP. 2023/4/52…”
Section: Related Workmentioning
confidence: 99%