“…However, previous work (Rashkin et al, 2018;Wolf et al, 2019) shows that there are some deficiencies in the performance to apply fine-tuning on conversational corpora directly. One possible reason could be the intrinsic difference of linguistic patterns between human conversations and writing text, resulting in a large gap of data distributions (Bao et al, 2019). Therefore, pre-training dialogue language models using chit-chat corpora from social media, such as Twitter or Reddit, has been recently investigated, especially for dialogue response generation (Zhang et al, 2019) and retrieval (Henderson et al, 2019b).…”