Proceedings of the 2nd Workshop on Machine Reading for Question Answering 2019
DOI: 10.18653/v1/d19-5801
|View full text |Cite
|
Sign up to set email alerts
|

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Abstract: We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. 1 In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the final six were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data samplin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
247
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 201 publications
(248 citation statements)
references
References 34 publications
1
247
0
Order By: Relevance
“…Figure 2 shows quantitatively, the discrepancy between predicting the correct answer text versus predicting the correct answer-span. Using BERT trained on curated Nat-uralQuestions (Fisch et al, 2019), we show the results of extractive QA task using exact match (EM) and Span-EM. EM only looks for the text to match the ground truth answer, whereas Span-EM additionally requires the span to be the same as the ground truth answer-span.…”
Section: Introductionmentioning
confidence: 99%
“…Figure 2 shows quantitatively, the discrepancy between predicting the correct answer text versus predicting the correct answer-span. Using BERT trained on curated Nat-uralQuestions (Fisch et al, 2019), we show the results of extractive QA task using exact match (EM) and Span-EM. EM only looks for the text to match the ground truth answer, whereas Span-EM additionally requires the span to be the same as the ground truth answer-span.…”
Section: Introductionmentioning
confidence: 99%
“…With this derived dataset, they test the model capability of unsupervised domain adaptation. Fisch et al [12] presented Machine Reading for Question Answering (MRQA) 2019 shared task, which tested extractive MRC models on their ability to generalize to data distributions different from the training distribution. They unified 18 distinct question answering datasets into the uniform format.…”
Section: Derivedmentioning
confidence: 99%
“…Evaluation of CDA is to test the overall performance on all the domains that the MRC model has encountered. As we knew, there have been a large variety of MRC tasks [12] proposed in the literature. However, all those tasks assume a stationary learning scenario, i.e., a fixed data distribution.…”
Section: Introductionmentioning
confidence: 99%
“…For these slots, both extractive and categorical DST models can be applied as shown in Table 1. (Fisch et al, 2019) that was focused on extractive question answering. MRQA contains six distinct datasets across different domains: SQuAD, NewsQA, TriviaQA, SearchQA, HotpotQA, and NaturalQuestions.…”
Section: Multiple-choice Reading Comprehension To Categorical Dialogumentioning
confidence: 99%