2019
DOI: 10.48550/arxiv.1901.08149
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

Abstract: We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transfo-rmer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the current state-ofthe-art end-to-end conversational models like memory augmented seq2seq and information-retrieval models.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
221
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 134 publications
(225 citation statements)
references
References 15 publications
2
221
2
Order By: Relevance
“…Output: Dialogue Response Persona [Li et al 2016b;Mazaré et al 2018;Siddique et al 2017;Wolf et al 2019] [ Song et al , 2020bZhang et al 2018b;Zhong et al 2020], [Zeng and Nie 2021;Zheng et al 2020] Politeness [Niu and Bansal 2018],…”
Section: Transformer-based Pre-trained Language Modelsmentioning
confidence: 99%
“…Output: Dialogue Response Persona [Li et al 2016b;Mazaré et al 2018;Siddique et al 2017;Wolf et al 2019] [ Song et al , 2020bZhang et al 2018b;Zhong et al 2020], [Zeng and Nie 2021;Zheng et al 2020] Politeness [Niu and Bansal 2018],…”
Section: Transformer-based Pre-trained Language Modelsmentioning
confidence: 99%
“…Full Objectives. The entire loss function aims to minimize the negative log-likelihood of language modeling and subtasks as in (Radford et al 2018;Wolf et al 2019). The full training objectives are defined as follows:…”
Section: Dialog Modulementioning
confidence: 99%
“…Therefore, with the success of pretrained language models (PrLMs) like BERT (Devlin et al 2019), RoBERTa (Liu et al 2019), a series of neural approaches based on crossencoders are proposed (Vig and Ramea 2019;Wolf et al 2019). Although enjoying satisfying retrieval rate, the retrieval time latency is often hard to tolerant in practical use.…”
Section: Open-domain Passage Retrievalmentioning
confidence: 99%