Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Jhamtani, Harsh; Clark, Peter E.

doi:10.18653/v1/2020.emnlp-main.10

Cited by 34 publications

(48 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Abduction CQ → fm Given C and an unprovable fact Q, identify a new fact fm that, when added to C, would make Q true. erate human-style justifications, which again are typically supporting evidence rather than a fullyformed line of reasoning, and without explicit reasoning rules (Camburu et al, 2018;Jhamtani and Clark, 2020;Inoue et al, 2020). In contrast, ProofWriter produces a deductive chain of reasoning from what is known to what is concluded, using a transformer retrained to reason systematically.…”

Section: Related Workmentioning

confidence: 99%

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

Tafjord¹,

Dalvi²,

Clark³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

Transformers have been shown to emulate logical deduction over natural language theories (logical rules expressed in natural language), reliably assigning true/false labels to candidate implications. However, their ability to generate implications of a theory has not yet been demonstrated, and methods for reconstructing proofs of answers are imperfect. In this work we show that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proofs that support them. In particular, iterating a 1-step implication generator results in proofs that are highly reliable, and represent actual model decisions (rather than post-hoc rationalizations). On the RuleTaker dataset, the accuracy of ProofWriter's proofs exceed previous methods by +9% absolute, and in a way that generalizes to proof depths unseen in training and on out-of-domain problems. We also show that generative techniques can perform a type of abduction with high precision: Given a theory and an unprovable conclusion, identify a missing fact that allows the conclusion to be proved, along with a proof. These results significantly improve the viability of neural methods for systematically reasoning over natural language. 1

show abstract

Section: Related Workmentioning

confidence: 99%

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

Tafjord¹,

Dalvi²,

Clark³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Structured Explanations: There is useful previous work on developing interpretable and explainable models (Doshi-Velez and Kim, 2017;Rudin, 2019;Hase and Bansal, 2020;Jacovi and Goldberg, 2020) for NLP. Explanations in NLP take three major forms -(1) extractive rationales or highlights (Zaidan et al, 2007;Lei et al, 2016;Yu et al, 2019;DeYoung et al, 2020) where a subset of the input text explain a prediction, (2) free-form or natural language explanations (Camburu et al, 2018;Rajani et al, 2019;Zhang et al, 2020;Kumar and Talukdar, 2020) that are not constrained to the input, and (3) structured explanations that range from semi-structured text (Ye et al, 2020) to chain of facts (Khot et al, 2020;Jhamtani and Clark, 2020;Gontier et al, 2020) to explanation graphs (based on edges between chains of facts) (Jansen et al, 2018;Jansen and Ustalov, 2019;Xie et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

Saha¹,

Yadav²,

Bansal³

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

We focus on a type of linguistic formal reasoning where the goal is to reason over explicit knowledge in the form of natural language facts and rules . A recent work, named PROVER (Saha et al., 2020), performs such reasoning by answering a question and also generating a proof graph that explains the answer. However, compositional reasoning is not always unique and there may be multiple ways of reaching the correct answer. Thus, in our work, we address a new and challenging problem of generating multiple proof graphs for reasoning over natural language rule-bases. Each proof provides a different rationale for the answer, thereby improving the interpretability of such reasoning systems. In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph. We propose two variants of a proof-set generation model, MULTIPROVER. Our first model, Multilabel-MULTIPROVER, generates a set of proofs via multi-label classification and implicit conditioning between the proofs; while the second model, Iterative-MULTIPROVER, generates proofs iteratively by explicitly conditioning on the previously generated proofs. Experiments on multiple synthetic, zero-shot, and human-paraphrased datasets reveal that both MULTIPROVER models significantly outperform PROVER on datasets containing multiple gold proofs. Iterative-MULTIPROVER obtains state-of-the-art proof F1 in zero-shot scenarios where all examples have single correct proofs. It also generalizes better to questions requiring higher depths of reasoning where multiple proofs are more frequent. Facts:F1: Bob is big. F2: Bob is blue. F3: Bob is furry. F4: Bob is young. F5: Dave is red. F6: Fiona is white. F7: Harry is big. F8: Harry is red. F9: Harry is round. F10: Harry is white. Rules:R1: White, round things are furry. R2: All blue, young things are big. R3: If something is white and young, then it is blue. R4: If Dave is round then Dave is white. R5: If something is blue and white then it is round. R6: If Harry is big and Harry is white then Harry is red. R7: All furry, red things are young. R8: Red things are round. R9: If something is blue then it is red.

show abstract

“…There is a recent explosion of explanation-centred datasets for multi-hop question answering (Jhamtani and Clark, 2020;Xie et al, 2020;Jansen et al, 2018;Yang et al, 2018;Thayaparan et al, 2020;Wiegreffe and Marasović, 2021). However, most of these datasets require the aggregation of only two sentences or paragraphs, making it hard to evaluate the robustness of the models in terms of semantic drift.…”

Section: Many-hop Multi-hop Training Datamentioning

confidence: 99%

TextGraphs 2021 Shared Task on Multi-Hop Inference for Explanation Regeneration

Jansen

Thayaparan

Valentino

et al. 2021

Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)

View full text Add to dashboard Cite

The Shared Task on Multi-Hop Inference for Explanation Regeneration asks participants to compose large multi-hop explanations to questions by assembling large chains of facts from a supporting knowledge base. While previous editions of this shared task aimed to evaluate explanatory completeness -finding a set of facts that form a complete inference chain, without gaps, to arrive from question to correct answer, this 2021 instantiation concentrates on the subtask of determining relevance in large multi-hop explanations. To this end, this edition of the shared task makes use of a large set of approximately 250k manual explanatory relevancy ratings that augment the 2020 shared task data. In this summary paper, we describe the details of the explanation regeneration task, the evaluation data, and the participating systems. Additionally, we perform a detailed analysis of participating systems, evaluating various aspects involved in the multi-hop inference process. The best performing system achieved an NDCG of 0.82 on this challenging task, substantially increasing performance over baseline methods by 32%, while also leaving significant room for future improvement.

show abstract

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Cited by 34 publications

References 19 publications

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

TextGraphs 2021 Shared Task on Multi-Hop Inference for Explanation Regeneration

Contact Info

Product

Resources

About