Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1264
|View full text |Cite
|
Sign up to set email alerts
|

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Abstract: We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a signi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

11
4,410
1
11

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 4,931 publications
(4,830 citation statements)
references
References 22 publications
11
4,410
1
11
Order By: Relevance
“…Single model EM / F1 EM / F1 LR Baseline (Rajpurkar et al, 2016) 40.0 / 51.0 40.4 / 51.0 Dynamic Chunk Reader (Yu et al, 2016) 62.5 / 71.2 62.5 / 71.0 Match-LSTM with Ans-Ptr (Wang and Jiang, 2016b) 64.1 / 73.9 64.7 / 73.7 Dynamic Coattention Networks (Xiong et al, 2016) 65.4 / 75.6 66.2 / 75.9 RaSoR (Lee et al, 2016) 66.4 / 74.9 -/ -BiDAF (Seo et al, 2016) 68.0 / 77.3 68.0 / 77.3 jNet (Zhang et al, 2017) -/ -68.7 / 77.4 Multi-Perspective Matching -/ -68.9 / 77.8 FastQA (Weissenborn et al, 2017) - (Rajpurkar et al, 2016) 80.3 / 90.5 77.0 / 86.8 Table 2: The performance of our gated self-matching networks (R-NET) and competing approaches 4 .…”
Section: Dev Set Test Setmentioning
confidence: 99%
See 1 more Smart Citation
“…Single model EM / F1 EM / F1 LR Baseline (Rajpurkar et al, 2016) 40.0 / 51.0 40.4 / 51.0 Dynamic Chunk Reader (Yu et al, 2016) 62.5 / 71.2 62.5 / 71.0 Match-LSTM with Ans-Ptr (Wang and Jiang, 2016b) 64.1 / 73.9 64.7 / 73.7 Dynamic Coattention Networks (Xiong et al, 2016) 65.4 / 75.6 66.2 / 75.9 RaSoR (Lee et al, 2016) 66.4 / 74.9 -/ -BiDAF (Seo et al, 2016) 68.0 / 77.3 68.0 / 77.3 jNet (Zhang et al, 2017) -/ -68.7 / 77.4 Multi-Perspective Matching -/ -68.9 / 77.8 FastQA (Weissenborn et al, 2017) - (Rajpurkar et al, 2016) 80.3 / 90.5 77.0 / 86.8 Table 2: The performance of our gated self-matching networks (R-NET) and competing approaches 4 .…”
Section: Dev Set Test Setmentioning
confidence: 99%
“…Moreover, SQuAD requires different forms of logical reasoning to infer the answer (Rajpurkar et al, 2016). Rapid progress has been made since the release of the SQuAD dataset.…”
Section: Introductionmentioning
confidence: 99%
“…There exist two big challenges: 1)Matching explicit information in the given context; 2)Incorporating implicit commonsense knowledge into human-like reasoning process. Previous machine comprehension tasks (Richardson et al, 2013;Rajpurkar et al, 2016) mainly focus on the first challenge, leading their solutions focusing on semantic matching between texts (Weston et al, 2014;Kumar et al, 2015;Narasimhan and Barzilay, 2015;Smith et al, 2015;Sukhbaatar et al, 2015;Hill et al, 2015;Wang et al, 2015Cui et al, 2016;Trischler et al, 2016a,b;Kadlec et al, 2016;Kobayashi et al, 2016;Wang and Jiang, 2016b), but ignore the second issues. One notable task is SNLI (Bowman et al, 2015), which considers entailment between two sentences.…”
Section: Related Workmentioning
confidence: 99%
“…Following the recent progress on end-to-end supervised question answering (Hermann et al, 2015;Rajpurkar et al, 2016), we consider the general problem of predicting an answer A given a query-document pair (Q, D). We do not make the assumption that the answer should be present verbatim in the document.…”
Section: Problem Descriptionmentioning
confidence: 99%
“…The dataset contains 18.58M instances divided into training, validation, and test with an 85/10/5 split. The answer is present verbatim in the document only 47.1% of the time, severely limiting models that label document spans, such as those developed for the popular SQUAD dataset (Rajpurkar et al, 2016).…”
Section: Supervised Versionmentioning
confidence: 99%