Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.142
|View full text |Cite
|
Sign up to set email alerts
|

Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems

Abstract: As the labeling cost for different modules in task-oriented dialog (ToD) systems is expensive, a major challenge is to train different modules with the least amount of labeled data. Recently, large-scale pre-trained language models, have shown promising results for few-shot learning in ToD. In this paper, we devise a selftraining approach to utilize the abundant unlabeled dialog data to further improve state-ofthe-art pre-trained models in few-shot learning scenarios for ToD systems. Specifically, we propose a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 36 publications
0
10
0
Order By: Relevance
“…Our reliance on large-scale generative pre-training is similar to prior work in unsupervised NMT which uses large-scale language modeling tasks on internet data as part of the bootstrap (Conneau & Lample, 2019;. The role of few-shot prompting and distillation in our method is related to recent work on unsupervised data augmentation using language models (Anaby-Tavor et al, 2020;Kumar et al, 2020;Papanikolaou & Pierleoni, 2020; and is also in the same spirit as recent work on selftraining and noisy-student training (Mi et al, 2021;Vu et al, 2021;Xie et al, 2020). Recent work on scaling laws for neural machine translation has shown that transformer decoders exhibit more favorable scaling than encoders (Ghorbani et al, 2021).…”
Section: Background and Related Workmentioning
confidence: 73%
“…Our reliance on large-scale generative pre-training is similar to prior work in unsupervised NMT which uses large-scale language modeling tasks on internet data as part of the bootstrap (Conneau & Lample, 2019;. The role of few-shot prompting and distillation in our method is related to recent work on unsupervised data augmentation using language models (Anaby-Tavor et al, 2020;Kumar et al, 2020;Papanikolaou & Pierleoni, 2020; and is also in the same spirit as recent work on selftraining and noisy-student training (Mi et al, 2021;Vu et al, 2021;Xie et al, 2020). Recent work on scaling laws for neural machine translation has shown that transformer decoders exhibit more favorable scaling than encoders (Ghorbani et al, 2021).…”
Section: Background and Related Workmentioning
confidence: 73%
“…Recent years has witnessed the remarkable success in textual dialog systems, which can be roughly divided into two categories: open-domain conversations with casual chi-chat (Song et al, 2020;Gangal et al, 2021;Chan et al, 2021;Yang et al, 2021) and task-oriented dialog systems (Pei et al, 2021;Santra et al, 2021;Mi et al, 2021;Madotto et al, 2021;Gou et al, 2021;Raghu et al, 2021), which are designed to help users achieve specific goals. Early efforts mainly adopt a sequence-to-sequence (Seq2Seq) architecture, but cannot work well in KB retrieval and reasoning.…”
Section: Unimodal Dialog Systemsmentioning
confidence: 99%
“…Several papers have adapted various resources for pretraining models to enhance their performances on fewshot learning, such as pretraining on hypertext (Aghajanyan et al, 2021b), question-infused pretraining (Jia et al, 2021), and self-training Vu et al, 2021;Wang et al, 2021b). Pretraining approaches have targeted specific tasks, such as task-oriented dialog (Mi et al, 2021), intent detection , and data-to-text generation (Chen et al, 2020). Our work differs as we use plain text as opposed to (naturally-occurring) human-annotated resources.…”
Section: Related Workmentioning
confidence: 99%