Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems

Mi, Fei; Zhou, Wanhao; Cai, Fengyu; Kong, Long; Huang, Minlie; Faltings, Boi

doi:10.18653/v1/2021.emnlp-main.142

Cited by 15 publications

(10 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our reliance on large-scale generative pre-training is similar to prior work in unsupervised NMT which uses large-scale language modeling tasks on internet data as part of the bootstrap (Conneau & Lample, 2019;. The role of few-shot prompting and distillation in our method is related to recent work on unsupervised data augmentation using language models (Anaby-Tavor et al, 2020;Kumar et al, 2020;Papanikolaou & Pierleoni, 2020; and is also in the same spirit as recent work on selftraining and noisy-student training (Mi et al, 2021;Vu et al, 2021;Xie et al, 2020). Recent work on scaling laws for neural machine translation has shown that transformer decoders exhibit more favorable scaling than encoders (Ghorbani et al, 2021).…”

Section: Background and Related Workmentioning

confidence: 73%

Unsupervised Neural Machine Translation with Generative Language Models Only

Han,

Babuschkin,

Edwards

et al. 2021

Preprint

View full text Add to dashboard Cite

We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these zeroshot translations by using them as few-shot demonstrations for sampling a larger synthetic dataset. This dataset is distilled by discarding the few-shot demonstrations and then fine-tuning. During backtranslation, we repeatedly generate translations for a set of inputs and then fine-tune a single language model on both directions of the translation task at once, ensuring cycle-consistency by swapping the roles of gold monotext and generated translations when fine-tuning. By using our method to leverage GPT-3's zero-shot translation capability, we achieve a new state-of-the-art in unsupervised translation on the WMT14 English-French benchmark, attaining a BLEU score of 42.1.

show abstract

Section: Background and Related Workmentioning

confidence: 73%

Unsupervised Neural Machine Translation with Generative Language Models Only

Han,

Babuschkin,

Edwards

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent years has witnessed the remarkable success in textual dialog systems, which can be roughly divided into two categories: open-domain conversations with casual chi-chat (Song et al, 2020;Gangal et al, 2021;Chan et al, 2021;Yang et al, 2021) and task-oriented dialog systems (Pei et al, 2021;Santra et al, 2021;Mi et al, 2021;Madotto et al, 2021;Gou et al, 2021;Raghu et al, 2021), which are designed to help users achieve specific goals. Early efforts mainly adopt a sequence-to-sequence (Seq2Seq) architecture, but cannot work well in KB retrieval and reasoning.…”

Section: Unimodal Dialog Systemsmentioning

confidence: 99%

UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System

Ma¹,

Li²,

Li³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

As a more natural and intelligent interaction manner, multimodal task-oriented dialog system recently has received great attention and many remarkable progresses have been achieved. Nevertheless, almost all existing studies follow the pipeline to first learn intra-modal features separately and then conduct simple feature concatenation or attention-based feature fusion to generate responses, which hampers them from learning inter-modal interactions and conducting crossmodal feature alignment for generating more intention-aware responses. To address these issues, we propose UniTranSeR, a Unified Transformer Semantic Representation framework with feature alignment and intention reasoning for multimodal dialog systems. Specifically, we first embed the multimodal features into a unified Transformer semantic space to prompt inter-modal interactions, and then devise a feature alignment and intention reasoning (FAIR) layer to perform cross-modal entity alignment and fine-grained key-value reasoning, so as to effectively identify user's intention for generating more accurate responses. Experimental results verify the effectiveness of UniTranSeR, showing that it significantly outperforms state-of-the-art approaches on the representative MMD dataset.

show abstract

“…Several papers have adapted various resources for pretraining models to enhance their performances on fewshot learning, such as pretraining on hypertext (Aghajanyan et al, 2021b), question-infused pretraining (Jia et al, 2021), and self-training Vu et al, 2021;Wang et al, 2021b). Pretraining approaches have targeted specific tasks, such as task-oriented dialog (Mi et al, 2021), intent detection , and data-to-text generation (Chen et al, 2020). Our work differs as we use plain text as opposed to (naturally-occurring) human-annotated resources.…”

Section: Related Workmentioning

confidence: 99%

Improving In-Context Few-Shot Learning via Self-Supervised Training

Chen¹,

Du²,

Pasunuru³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Self-supervised pretraining has made few-shot learning possible for many NLP tasks. But the pretraining objectives are not typically adapted specifically for in-context few-shot learning. In this paper, we propose to use selfsupervision in an intermediate training stage between pretraining and downstream few-shot usage with the goal to teach the model to perform in-context few shot learning. We propose and evaluate four self-supervised objectives on two benchmarks. We find that the intermediate self-supervision stage produces models that outperform strong baselines. Ablation study shows that several factors affect the downstream performance, such as the amount of training data and the diversity of the self-supervised objectives. Human-annotated cross-task supervision and self-supervision are complementary. Qualitative analysis suggests that the self-supervised-trained models are better at following task requirements.

show abstract

Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems

Cited by 15 publications

References 36 publications

Unsupervised Neural Machine Translation with Generative Language Models Only

Unsupervised Neural Machine Translation with Generative Language Models Only

UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System

Improving In-Context Few-Shot Learning via Self-Supervised Training

Contact Info

Product

Resources

About