Natural Questions: A Benchmark for Question Answering                     Research

Kwiatkowski, Tom; Palomaki, Jennimaria; Redfield, Olivia; Collins, Michael; Parikh, Ankur P.; Alberti, Chris; Epstein, Danielle; Polosukhin, Illia; Devlin, Jacob; Lee, Kenton; Toutanova, Kristina; Jones, Llion; Kelcey, Matthew; Chang, Ming‐Wei; Dai, Andrew M.; Uszkoreit, Jakob; Le, Quoc V.; Petrov, Slav

doi:10.1162/tacl_a_00276

Cited by 1,298 publications

(1,172 citation statements)

References 21 publications

Supporting

Mentioning

1,168

Contrasting

Unclassified

Order By: Relevance

“…We describe the process of estimating the correctness of collected QDMR annotations. Similar to previous works (Yu et al, 2018;Kwiatkowski et al, 2019) we use expert judgements, where the experts had prepared the guidelines for the annotation task. Given a question and its annotated QDMR, (q, s) the expert determines the correctness of s using one of the following categories:…”

Section: Quality Analysismentioning

confidence: 99%

Break It Down: A Question Understanding Benchmark

Wolfson

Geva

Gupta

et al. 2020

Transactions of the Association for Computational Linguistics

118

116

View full text Add to dashboard Cite

Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, showing that quality QDMRs can be annotated at scale, and release the BREAK dataset, containing over 83K pairs of questions and their QDMRs. We demonstrate the utility of QDMR by showing that (a) it can be used to improve open-domain question answering on the HOTPOTQA dataset, (b) it can be deterministically converted to a pseudo-SQL formal language, which can alleviate annotation in semantic parsing applications. Last, we use BREAK to train a sequenceto-sequence model with copying that parses questions into QDMR structures, and show that it substantially outperforms several natural baselines. QDMR NL Question Decomposition SupervisionQuestion Decomposition 1. Shayne Graham 2. field goals of #1 3. yards of #2 4. number of #2 for each #3 5. #3 where #4 is two 1. papers 2. #1 in ACL 3. keywords of #2 4. number of #2 for each #3 5. #3 where #4 is more than 100 select[papers] filter[ACL] project[keywords] group[count] comparative[> ,100] select[Shayne Graham] project[field goals] project[yards] group[count] comparative[= ,two] select[objects] project[colors] group[count] superlative[max] QDMR NL

show abstract

Section: Quality Analysismentioning

confidence: 99%

Break It Down: A Question Understanding Benchmark

Wolfson

Geva

Gupta

et al. 2020

Transactions of the Association for Computational Linguistics

118

116

View full text Add to dashboard Cite

show abstract

“…Second, the information that supports predicting the answer from the source is often fully observed: the source is static, sufficient, and presented in its entirety. This does not match the information-seeking procedure that arises in answering many natural questions (Kwiatkowski et al, 2019), nor can it model the way humans observe and interact with the world to acquire knowledge.…”

Section: Gamementioning

confidence: 99%

Interactive Language Learning by Question Answering

Yuan¹,

Côté²,

Fu³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Humans observe and interact with the world to acquire knowledge. However, most existing machine reading comprehension (MRC) tasks miss the interactive, information-seeking component of comprehension. Such tasks present models with static documents that contain all necessary information, usually concentrated in a single short substring. Thus, models can achieve strong performance through simple word-and phrase-based pattern matching. We address this problem by formulating a novel text-based question answering task: Question Answering with Interactive Text (QAit). 1 In QAit, an agent must interact with a partially observable text-based environment to gather information required to answer questions. QAit poses questions about the existence, location, and attributes of objects found in the environment. The data is built using a text-based game generator that defines the underlying dynamics of interaction with the environment. We propose and evaluate a set of baseline models for the QAit task that includes deep reinforcement learning agents. Experiments show that the task presents a major challenge for machine reading systems, while humans solve it with relative ease.

show abstract

“…DecAtt + Doc Reader (Parikh et al, 2016) 31.4 BERT (Devlin et al, 2018) 50.2 BERT w/ SQuAD 1.1 for context on these data sets). NQ is preferred for evaluating production systems since the questions were "naturally" generated and does not suffer from the observational bias inherent in SQuAD's data collection approach (Kwiatkowski et al, 2019). When reporting results with the SQuAD dataset, we use the methodology (and evaluation script) made available with (Rajpurkar et al, 2018).…”

Section: F1 Prior Workmentioning

confidence: 99%

CFO: A Framework for Building Production NLP Systems

Chakravarti

Pendus²,

Sakrajda³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Comprehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems.

show abstract

Natural Questions: A Benchmark for Question Answering Research

Cited by 1,298 publications

References 21 publications

Break It Down: A Question Understanding Benchmark

Break It Down: A Question Understanding Benchmark

Interactive Language Learning by Question Answering

CFO: A Framework for Building Production NLP Systems

Contact Info

Product

Resources

About