Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI 2021
DOI: 10.18653/v1/2021.nlp4convai-1.19
|View full text |Cite
|
Sign up to set email alerts
|

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

Abstract: Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for taskoriented dialogue, such as lack of knowledge grounding or diversity. To address these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via backtranslation to increase the diversity of the training data. We further examine the possibilities of combining data from mult… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(26 citation statements)
references
References 26 publications
0
14
0
Order By: Relevance
“…One idea that has gained particular attention is transfer learning -specifically, finding ways to leverage knowledge learned by pre-trained large language models (PLMs) for new tasks. PLMs have demonstrated impressive emerging conversational capabilities, enabling big performance improvements in various dialogue tasks (Brown et al, 2020;Shuster et al, 2022;Peng et al, 2022;Kulhánek et al, 2021). Particularly, PLMs have been prompted to augment existing conversational data (Chen et al, 2022;Mehri et al, 2022;Sahu et al, 2022).…”
Section: Triadic Conversations Dyadic Conversationsmentioning
confidence: 99%
See 1 more Smart Citation
“…One idea that has gained particular attention is transfer learning -specifically, finding ways to leverage knowledge learned by pre-trained large language models (PLMs) for new tasks. PLMs have demonstrated impressive emerging conversational capabilities, enabling big performance improvements in various dialogue tasks (Brown et al, 2020;Shuster et al, 2022;Peng et al, 2022;Kulhánek et al, 2021). Particularly, PLMs have been prompted to augment existing conversational data (Chen et al, 2022;Mehri et al, 2022;Sahu et al, 2022).…”
Section: Triadic Conversations Dyadic Conversationsmentioning
confidence: 99%
“…However, prompt-based augmentation strategies are uncontrolled forms of generation, which may result in generation mistakes for labeled datasets (Sahu et al, 2022;Chen et al, 2022;Meng et al, 2022). In contrast, other recent studies have instead proposed language augmentation strategies that use complex, highly-controlled frameworks that often involve fine-tuning generators (Papangelis et al, 2021;Kulhánek et al, 2021;. Such complex augmentation frameworks require larger amounts of seed data to maintain a ground-truth language distribution (Rosenbaum et al, 2022b;Kim et al, 2021b), and are more costly than prompting PLMs (Chen et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…We denote several versions of the models provided in their paper as SeKnow-S2S(Single), SeKnow-S2S(Multiple), and SeKnow-GPT2. AuGPT (Kulhánek et al, 2021) uses modified training objectives and employs data augmentation to increase the diversity of generated utterances. HyKnow (Gao et al, 2021) first extends the dialogue state to query the database, and then uses all the information to generate a system response.…”
Section: End-to-end Modelmentioning
confidence: 99%
“…Baselines Our framework is compared to other end-to-end unified TODS approaches that perform end-to-end TODS using a unified text-totext paradigm through a single generalized text (Su et al, 2022), Soloist (Peng et al, 2021), UBAR (Yang et al, 2021), AuGPT (Kulhánek et al, 2021), and Galaxy . In addition, as described in §3, we add the naive version of the LLM response generation approach which is fed by the full KB (RG naive ), as an additional baseline to better evaluate the effectiveness of our framework.…”
Section: Experiments Settingsmentioning
confidence: 99%