Data Augmentation for Intent Classification

Chen, Derek; Yin, Claire

doi:10.48550/arxiv.2206.05790

Cited by 2 publications

(2 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, OOD cases are by definition areas the network has not seen, leading to poor performance. Data augmentation and other robustness methods may serve as a strong tool to cover the unknown space by maximizing the diversity of the examples (Ng et al, 2020;Chen and Yin, 2022).…”

Section: Resultsmentioning

confidence: 99%

Sources of Noise in Dialogue and How to Deal with Them

Chen,

2023

Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue

View full text Add to dashboard Cite

Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-ofconcept for targeted dialogue denoising.

show abstract

Section: Resultsmentioning

confidence: 99%

Sources of Noise in Dialogue and How to Deal with Them

Chen,

2023

Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue

View full text Add to dashboard Cite

show abstract

“…Alternatively, and perform data augmentation with prompting, but their prompts are not compositional since their task setups are focused on single-aspect class prediction. Data Augmentation is a common technique in NLP for counteracting the limited data available with few-shot learning (Feng et al, 2021;Chen and Yin, 2022). Flavors of data augmentation include surface form alteration (Wei and Zou, 2019), latent perturbation (Sennrich et al, 2016;Fabius et al, 2015) or auxiliary supervision (Chen and Yu, 2021).…”

Section: Topv2mentioning

confidence: 99%

Mixture of Soft Prompts for Controllable Data Generation

Chen,

Lee,

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Large language models (LLMs) effectively generate fluent text when the target output follows natural language patterns. However, structured prediction tasks confine the output format to a limited ontology, causing even very large models to struggle since they were never trained with such restrictions in mind. The difficulty of using LLMs for direct prediction is exacerbated in few-shot learning scenarios, which commonly arise due to domain shift and resource limitations. We flip the problem on its head by leveraging the LLM as a tool for data augmentation rather than a model for direct prediction. Our proposed Mixture of Soft Prompts (MSP) serves as a parameter-efficient procedure for generating multi-attribute data in a controlled manner. Denoising mechanisms are further applied to improve the quality of synthesized data. Automatic metrics show our method is capable of producing diverse and natural text, while preserving label semantics. Moreover, MSP achieves state-of-the-art results on three benchmarks when compared against strong baselines. Our method offers an alternate data-centric approach for applying LLMs to complex prediction tasks.

show abstract

Data Augmentation for Intent Classification

Cited by 2 publications

References 13 publications

Sources of Noise in Dialogue and How to Deal with Them

Sources of Noise in Dialogue and How to Deal with Them

Mixture of Soft Prompts for Controllable Data Generation

Contact Info

Product

Resources

About