Answering Open-Domain Questions of Varying Reasoning Steps from Text

Qi, Peng; Lee, Haejun; Sido, Oghenetegiri Tg; Manning, Christopher D.

doi:10.18653/v1/2021.emnlp-main.292

Cited by 23 publications

(15 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Step Execution (EX) Model. Similar to prior work (Talmor and Berant, 2018;Min et al, 2019b;Qi et al, 2021;, this model performs explicit, step-by-step multihop reasoning, by first decomposing the Q into a DAG G Q having single-hop questions, and then calling single-hop repeatedly to execute this decomposition. The decomposer is trained with gold decompositions, and is implemented with BART-large.…”

Section: Multihop Modelsmentioning

confidence: 99%

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

Trivedi

Balasubramanian

Khot

et al. 2022

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Multihop reasoning remains an elusive goal as existing multihop benchmarks are known to be largely solvable via shortcuts. Can we create a question answering (QA) dataset that, by construction, requires proper multihop reasoning? To this end, we introduce a bottom–up approach that systematically selects composable pairs of single-hop questions that are connected, that is, where one reasoning step critically relies on information from another. This bottom–up methodology lets us explore a vast space of questions and add stringent filters as well as other mechanisms targeting connected reasoning. It provides fine-grained control over the construction process and the properties of the resulting k-hop questions. We use this methodology to create MuSiQue-Ans, a new multihop QA dataset with 25K 2–4 hop questions. Relative to existing datasets, MuSiQue-Ans is more difficult overall (3× increase in human–machine gap), and harder to cheat via disconnected reasoning (e.g., a single-hop model has a 30-point drop in F1). We further add unanswerable contrast questions to produce a more stringent dataset, MuSiQue-Full. We hope our datasets will help the NLP community develop models that perform genuine multihop reasoning.1

show abstract

Section: Multihop Modelsmentioning

confidence: 99%

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

Trivedi

Balasubramanian

Khot

et al. 2022

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Given a question q, the retriever finds sequences of supporting documents (paths) of length n that can be used to answer the question. At each step of the retrieval process we use an inexpensive retrieval method to identify a small set of promising candidates to narrow down the search space, as commonly done in the literature (Qi et al, 2021;Asai et al, 2020). We use an LM to more accurately rerank n-hop chains of documents based on their relevance to the question (described in Section 2.2).…”

Section: Overviewmentioning

confidence: 99%

“…Asai et al (2020) combined TF-IDF retriever with a recurrent graph retriever and used the reader module to re-rank paths based on the answer confidence. Qi et al (2021) used a single transformer model to perform retrieval, reranking, and reading in an iterative fashion. However, the good performance of previous work comes mainly from training on a large number of examples and are likely to fail in low-data settings.…”

Section: Related Workmentioning

confidence: 99%

“…Formally, given a multi-hop question and a large corpus of documents, an MQA system will first use a retriever to identify multiple documents to support the reader to produce the final answer. Existing MQA systems (Asai et al, 2020;Qi et al, 2021;Xiong et al, 2021; are designed under the assumption that abundant labeled examples are available for training both modules, yet this may not be realistic. First, to train the retriever module, one needs examples with questions paired with the corresponding supporting documents, which is laborious to construct (Izacard and Grave, 2021a).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Few-shot Reranking for Multi-hop QA via Language Model Prompting

Khalifa¹,

Logeswaran²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study unsupervised multi-hop reranking for multi-hop QA (MQA) with open-domain questions. Since MQA requires piecing information from multiple documents, the main challenge thus resides in retrieving and reranking chains of passages that support the reasoning process. Our approach relies on LargE models with Prompt-Utilizing reranking Strategy (LEPUS): we construct an instructionlike prompt based on a candidate document path and compute a relevance score of the path as the probability of generating a given question, according to a pre-trained language model. Though unsupervised, LEPUS yields competitive reranking performance against state-of-the-art methods that are trained on thousands of examples. Adding a small number of samples (e.g., 2), we demonstrate further performance gain using in-context learning. Finally, we show that when integrated with a reader module, LEPUS can obtain competitive multi-hop QA performance, e.g., outperforming fully-supervised QA systems. 1

show abstract

“…(Harman, 1993), SQuAD (Rajpurkar et al, 2018), NewsQA (Trischler et al, 2017), SearchQA (Dunn et al, 2017), and QuAC (Choi et al, 2018), and intensive efforts were made to build new models that surpass the human performance on these datasets, including the pre-trained language models (Devlin et al, 2019;Yang et al, 2019a) or the ensemble models that outperform the human, in particular on SQuAD (Lan et al, 2020;Yamada et al, 2020;. More challenging datasets are also introduced, which require several reasoning steps to answer (Yang et al, 2018;Qi et al, 2021), the understanding of a much larger context (Kočiský et al, 2018) or the understanding of the adversarial content and numeric reasoning (Dua et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Tracing Origins: Coreference-aware Machine Reading Comprehension

Huang¹,

Zhang²,

Zhang³

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Machine reading comprehension is a heavilystudied research and test field for evaluating new pre-trained language models (PrLMs) and fine-tuning strategies, and recent studies have enriched the pre-trained language models with syntactic, semantic and other linguistic information to improve the performance of the models. In this paper, we imitate the human reading process in connecting the anaphoric expressions and explicitly leverage the coreference information of the entities to enhance the word embeddings from the pretrained language model, in order to highlight the coreference mentions of the entities that must be identified for coreference-intensive question answering in QUOREF, a relatively new dataset that is specifically designed to evaluate the coreference-related performance of a model. We use two strategies to finetune a pre-trained language model, namely, placing an additional encoder layer after a pre-trained language model to focus on the coreference mentions or constructing a relational graph convolutional network to model the coreference relations. We demonstrate that the explicit incorporation of coreference information in the fine-tuning stage performs better than the incorporation of the coreference information in pre-training a language model.

show abstract

Answering Open-Domain Questions of Varying Reasoning Steps from Text

Cited by 23 publications

References 29 publications

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

Few-shot Reranking for Multi-hop QA via Language Model Prompting

Tracing Origins: Coreference-aware Machine Reading Comprehension

Contact Info

Product

Resources

About