2019
DOI: 10.1162/tacl_a_00276
|View full text |Cite
|
Sign up to set email alerts
|

Natural Questions: A Benchmark for Question Answering Research

Abstract: We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations;… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
1,168
1
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 1,298 publications
(1,172 citation statements)
references
References 21 publications
2
1,168
1
1
Order By: Relevance
“…We describe the process of estimating the correctness of collected QDMR annotations. Similar to previous works (Yu et al, 2018;Kwiatkowski et al, 2019) we use expert judgements, where the experts had prepared the guidelines for the annotation task. Given a question and its annotated QDMR, (q, s) the expert determines the correctness of s using one of the following categories:…”
Section: Quality Analysismentioning
confidence: 99%
“…We describe the process of estimating the correctness of collected QDMR annotations. Similar to previous works (Yu et al, 2018;Kwiatkowski et al, 2019) we use expert judgements, where the experts had prepared the guidelines for the annotation task. Given a question and its annotated QDMR, (q, s) the expert determines the correctness of s using one of the following categories:…”
Section: Quality Analysismentioning
confidence: 99%
“…Second, the information that supports predicting the answer from the source is often fully observed: the source is static, sufficient, and presented in its entirety. This does not match the information-seeking procedure that arises in answering many natural questions (Kwiatkowski et al, 2019), nor can it model the way humans observe and interact with the world to acquire knowledge.…”
Section: Gamementioning
confidence: 99%
“…DecAtt + Doc Reader (Parikh et al, 2016) 31.4 BERT (Devlin et al, 2018) 50.2 BERT w/ SQuAD 1.1 for context on these data sets). NQ is preferred for evaluating production systems since the questions were "naturally" generated and does not suffer from the observational bias inherent in SQuAD's data collection approach (Kwiatkowski et al, 2019). When reporting results with the SQuAD dataset, we use the methodology (and evaluation script) made available with (Rajpurkar et al, 2018).…”
Section: F1 Prior Workmentioning
confidence: 99%