An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models

Chronopoulou, Alexandra; Baziotis, Christos; Potamianos, Alexandros

doi:10.18653/v1/n19-1213

Cited by 90 publications

(71 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Interest in learning general-purpose representations for natural language through unsupervised, multi-task and transfer learning has been skyrocketing lately Radford et al, 2018;McCann et al, 2018;Chronopoulou et al, 2019;Phang et al, 2018;. In parallel to our work, studies that focus on generalization have appeared on publication servers, empirically studying generalization to multiple tasks (Yogatama et al, 2019;Liu et al, 2019).…”

Section: Related Workmentioning

confidence: 74%

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Talmor¹,

Berant²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

156

135

View full text Add to dashboard Cite

A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MULTIQA, a BERTbased model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.

show abstract

Section: Related Workmentioning

confidence: 74%

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Talmor¹,

Berant²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

156

135

View full text Add to dashboard Cite

show abstract

“…Howard and Ruder (2018) proposed to fine-tune the pre-trained LM with sentences from the downstream dataset and showed that it boosts the performance of the downstream task. Chronopoulou et al (2019) also demonstrated the effectiveness of the fine-tuning method.…”

Section: Language Model Fine-tuningmentioning

confidence: 84%

Learning Asr-Robust Contextualized Embeddings for Spoken Language Understanding

Huang

Chen

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech recognizer (ASR) is concerned. Therefore, this paper focuses on making contextualized representations more ASRrobust. We propose a novel confusion-aware fine-tuning method to mitigate the impact of ASR errors to pre-trained LMs. Specifically, we fine-tune LMs to produce similar representations for acoustically confusable words that are obtained from word confusion networks (WCNs) produced by ASR. Experiments on the benchmark ATIS dataset show that the proposed method significantly improves the performance of spoken language understanding when performing on ASR transcripts.

show abstract

“…Specifically, we have the following auxiliary tasks: Masked Language Model Since the pretraining is usually preformed on the corpus with restricted domains, it is expected that further pretraining on more diverse domains may improve the generalization capability. Hence, we add an auxiliary task, masked language model (Chronopoulou et al, 2019), in the fine-tuning stage, along with the MRC task. Moreover, we use three corpus with different domains as the input for masked language model: (1) the passages in MRQA in-domain datasets that include wikipedia, news and search snippets; (2) the search snippets from Bing 6 .…”

Section: Fine-tuning Mrc Models With Multi-task Learningmentioning

confidence: 99%

“…Hence, we incorporate masked language model by using corpus from various domains as an auxiliary task in the fine-tuning phase, along with MRC. The side effect of adding a language modeling objective to MRC is that it can avoid catastrophic forgetting and keep the most useful features learned from pretraining task (Chronopoulou et al, 2019). Additionally, we explore multi-task learning (Liu et al, 2019) by incorporating the supervised dataset from other NLP tasks (e.g.…”

Section: Introductionmentioning

confidence: 99%

D-NET: A Pre-Training and Fine-Tuning Framework for Improving the Generalization of Machine Reading Comprehension

Xi-yuan²,

Liu³

et al. 2019

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

View full text Add to dashboard Cite

In this paper, we introduce a simple system Baidu submitted for MRQA (Machine Reading for Question Answering) 2019 Shared Task that focused on generalization of machine reading comprehension (MRC) models. Our system is built on a framework of pretraining and fine-tuning, namely D-NET. The techniques of pre-trained language models and multi-task learning are explored to improve the generalization of MRC models and we conduct experiments to examine the effectiveness of these strategies. Our system is ranked at top 1 of all the participants in terms of averaged F1 score. Our codes and models will be released at PaddleNLP 1 .

show abstract

An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models

Cited by 90 publications

References 27 publications

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Learning Asr-Robust Contextualized Embeddings for Spoken Language Understanding

D-NET: A Pre-Training and Fine-Tuning Framework for Improving the Generalization of Machine Reading Comprehension

Contact Info

Product

Resources

About