2021
DOI: 10.48550/arxiv.2107.12708
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Abstract: Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(17 citation statements)
references
References 205 publications
(329 reference statements)
0
17
0
Order By: Relevance
“…Keyword Matching Without Structural Concern Aligning with the insight that retrieval often emphasizes content matching rather than complex reasoning (Rogers et al, 2021), we find that 71 out of the 100 samples only require simple keyword matching, where 18 questions fully match with table titles (Figure 2 (a)) and the other 53 questions further match with table headers (Figure 2 (b)).…”
Section: Nq-table Analysis: How Muchmentioning
confidence: 81%
“…Keyword Matching Without Structural Concern Aligning with the insight that retrieval often emphasizes content matching rather than complex reasoning (Rogers et al, 2021), we find that 71 out of the 100 samples only require simple keyword matching, where 18 questions fully match with table titles (Figure 2 (a)) and the other 53 questions further match with table headers (Figure 2 (b)).…”
Section: Nq-table Analysis: How Muchmentioning
confidence: 81%
“…SQuAD is available under the CC BY-SA license. SQuAD has become a de facto standard and inspired creation of analogous resources in other languages (Rogers et al, 2021).…”
Section: Question Answeringmentioning
confidence: 99%
“…The rather high age of participants (see Fig. 4) may have induced significant demographic bias [56] regarding negative attitudes towards artificial intelligence and, thus, ACA [17]. No person below 18 years participated due to legal constraints by the platform.…”
Section: Remote User Experience Survey (Gui Prototype 2)mentioning
confidence: 99%