We focus on a type of linguistic formal reasoning where the goal is to reason over explicit knowledge in the form of natural language facts and rules . A recent work, named PROVER (Saha et al., 2020), performs such reasoning by answering a question and also generating a proof graph that explains the answer. However, compositional reasoning is not always unique and there may be multiple ways of reaching the correct answer. Thus, in our work, we address a new and challenging problem of generating multiple proof graphs for reasoning over natural language rule-bases. Each proof provides a different rationale for the answer, thereby improving the interpretability of such reasoning systems. In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph. We propose two variants of a proof-set generation model, MULTIPROVER. Our first model, Multilabel-MULTIPROVER, generates a set of proofs via multi-label classification and implicit conditioning between the proofs; while the second model, Iterative-MULTIPROVER, generates proofs iteratively by explicitly conditioning on the previously generated proofs. Experiments on multiple synthetic, zero-shot, and human-paraphrased datasets reveal that both MULTIPROVER models significantly outperform PROVER on datasets containing multiple gold proofs. Iterative-MULTIPROVER obtains state-of-the-art proof F1 in zero-shot scenarios where all examples have single correct proofs. It also generalizes better to questions requiring higher depths of reasoning where multiple proofs are more frequent.
Facts:F1: Bob is big. F2: Bob is blue. F3: Bob is furry. F4: Bob is young. F5: Dave is red. F6: Fiona is white. F7: Harry is big. F8: Harry is red. F9: Harry is round. F10: Harry is white.
Rules:R1: White, round things are furry. R2: All blue, young things are big. R3: If something is white and young, then it is blue. R4: If Dave is round then Dave is white. R5: If something is blue and white then it is round. R6: If Harry is big and Harry is white then Harry is red. R7: All furry, red things are young. R8: Red things are round. R9: If something is blue then it is red.