Proceedings of the 2nd Workshop on Representation Learning for NLP 2017
DOI: 10.18653/v1/w17-2623
|View full text |Cite
|
Sign up to set email alerts
|

NewsQA: A Machine Comprehension Dataset

Abstract: We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text in the articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. Analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
585
0
2

Year Published

2017
2017
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 618 publications
(590 citation statements)
references
References 19 publications
3
585
0
2
Order By: Relevance
“…NewsQA (Trischler et al, 2016): we randomly chose questions that satisfied the following conditions:…”
Section: A Sampling Methods For Questionsmentioning
confidence: 99%
“…NewsQA (Trischler et al, 2016): we randomly chose questions that satisfied the following conditions:…”
Section: A Sampling Methods For Questionsmentioning
confidence: 99%
“…NewsQA The NewsQA dataset (Trischler et al, 2017) 3 contains 100k answerable questions from a total of 120k questions. The dataset is built from CNN news stories that were originally collected by Hermann et al (2015).…”
Section: Methodsmentioning
confidence: 99%
“…A thorough analysis by Chen et al (2016), however, revealed that the DailyMail/CNN was too easy and still quite noisy. New datasets were constructed to eliminate these problems including SQuAD (Rajpurkar et al, 2016), NewsQA (Trischler et al, 2017) and MsMARCO (Nguyen et al, 2016).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Datasets with natural language questions include MCTest (Richardson et al, 2013), SQuAD (Rajpurkar et al, 2016), and NewsQA (Trischler et al, 2016). MCTest is limited in scale with only 2640 multiple choice questions.…”
Section: Reading Comprehensionmentioning
confidence: 99%