Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.95
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models

Abstract: In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoderdecoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspecti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(21 citation statements)
references
References 22 publications
2
19
0
Order By: Relevance
“…While fine-tuning pre-trained representations usually provides strong empirical performance (Wang et al, 2018;Talmor et al, 2020), how fine-tuning manage to do so has remained an open question. Moreover, the instability (Mosbach et al, 2020a;Dodge et al, 2020;Zhang et al, 2020) and forgetting problems He et al, 2021) make it harder to analyze fine-tuned representations. Despite these difficulties, previous work (Merchant et al, 2020;Mosbach et al, 2020b;Hao et al, 2020) draw valuable conclusions about fine-tuning.…”
Section: Related Workmentioning
confidence: 99%
“…While fine-tuning pre-trained representations usually provides strong empirical performance (Wang et al, 2018;Talmor et al, 2020), how fine-tuning manage to do so has remained an open question. Moreover, the instability (Mosbach et al, 2020a;Dodge et al, 2020;Zhang et al, 2020) and forgetting problems He et al, 2021) make it harder to analyze fine-tuned representations. Despite these difficulties, previous work (Merchant et al, 2020;Mosbach et al, 2020b;Hao et al, 2020) draw valuable conclusions about fine-tuning.…”
Section: Related Workmentioning
confidence: 99%
“…While fine-tuning pre-trained representations usually provides strong empirical performance (Wang et al, 2018;Talmor et al, 2020), how fine-tuning changes the representation to do so has remained an open question. Moreover, the instability (Mosbach et al, 2020a;Dodge et al, 2020; and forgetting problems He et al, 2021) of fine-tuning make it harder to analyze fine-tuned representations. With these difficulties, previous work (Merchant et al, 2020;Mosbach et al, 2020b;Hao et al, 2020) draw valuable conclusions about fine-tuning.…”
Section: Related Workmentioning
confidence: 99%
“…Such models can respond well even in challenging dialogue tasks Adiwardana et al, 2020;Huang et al, 2020). Due to the hardware and data requirements of such models, fine-tuning pre-trained models is a popular approach for obtaining well-performing language generation models (Howard and Ruder, 2018;Wolf et al, 2019a;He et al, 2021). Lack of consistency is one of the major issues in neural dialogue generation, which has been tackled by methods such as including persona or situation description to improve the consistency between generated sentences across multiple turns of dialogue.…”
Section: Related Workmentioning
confidence: 99%