Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1485
|View full text |Cite
|
Sign up to set email alerts
|

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Abstract: A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
142
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 156 publications
(146 citation statements)
references
References 41 publications
4
142
0
Order By: Relevance
“…Within these restrictions, we encouraged participants to explore how to best utilize the provided data. Inspired by Talmor and Berant (2019), two submissions (Su et al, 2019;Longpre et al, 2019) analyzed similarities between datasets. Unsurprisingly, the performance improved significantly when fine-tuned on the training dataset most similar to the evaluation dataset of interest.…”
Section: Summary Of Findingsmentioning
confidence: 99%
“…Within these restrictions, we encouraged participants to explore how to best utilize the provided data. Inspired by Talmor and Berant (2019), two submissions (Su et al, 2019;Longpre et al, 2019) analyzed similarities between datasets. Unsurprisingly, the performance improved significantly when fine-tuned on the training dataset most similar to the evaluation dataset of interest.…”
Section: Summary Of Findingsmentioning
confidence: 99%
“…Nevertheless, not much work has been done for this particular learning challenge. Most work on RC focuses on the model architecture and simply chooses the first span or a random span from the document (Joshi et al, 2017;Tay et al, 2018;Talmor and Berant, 2019), rather than modeling this uncertainty as a latent choice. Others maximize the sum of the likelihood of multiple spans (Kadlec et al, 2016;Swayamdipta et al, 2018;Clark and Gardner, 2018;, but it is unclear if it gives a meaningful improvement.…”
Section: Related Workmentioning
confidence: 99%
“…Language models pre-trained on large-scale text corpora achieve state-of-the-art performance in various natural language processing (NLP) tasks when fine-tuned on a given task [4,13,15]. Language models have been shown to be highly effective in question answering (QA), and many current state-of-the-art QA models often rely on pre-trained language models [20]. However, as language models are mostly pre-trained on general domain corpora, they cannot be generalized to biomedical corpora [1,2,8,29].…”
Section: Introductionmentioning
confidence: 99%