Finding Generalizable Evidence by Learning to Convince Q&amp;A Models

Perez, Ethan; Karamcheti, Siddharth; Fergus, Rob; Weston, Jason; Cho, Kyunghyun

doi:10.18653/v1/d19-1244

Cited by 26 publications

(31 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The BERT-QA baseline scores surprisingly low. A possible explanation is that, in the original setting, Perez et al (2019)'s model learned to spot a (usually) single relevant sentence among a passage of irrelevant sentences. In our setting, though, all the chains are partially relevant, making it harder for the model to distinguish just one as central.…”

Section: Results: Performance On Eqascmentioning

confidence: 99%

“…In the context of QA, there are multiple notions of explanation/justification, including showing an authoritative, answer-bearing sentence (Perez et al, 2019), a collection of text snippets supporting an answer (DeYoung et al, 2020), an attention map over a passage (Seo et al, 2016), a synthesized phrase connecting question and answer (Rajani et al, 2019), or the syntactic pattern used to locate the answer (Ye et al, 2020;Hancock et al, 2018). These methods are primarily designed for answers to "lookup" questions, to explain where and how an answer was found in a corpus.…”

Section: Related Workmentioning

confidence: 99%

“…We also consider a baseline, BERT-QA, by adapting the approach of Perez et al (2019) to our task. In the original work, given a passage of text and a multiple choice question, the system identifies the sentence(s) S that are the most convincing evidence for a given answer option a i .…”

Section: Baselinesmentioning

confidence: 99%

See 2 more Smart Citations

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Jhamtani¹,

Clark²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Despite the rapid progress in multihop question-answering (QA), models still have trouble explaining why an answer is correct, with limited explanation training data available to learn from. To address this, we introduce three explanation datasets in which explanations formed from corpus facts are annotated. Our first dataset, eQASC, contains over 98K explanation annotations for the multihop question answering dataset QASC, and is the first that annotates multiple candidate explanations for each answer. The second dataset eQASC-perturbed is constructed by crowd-sourcing perturbations (while preserving their validity) of a subset of explanations in QASC, to test consistency and generalization of explanation prediction models. The third dataset eOBQA is constructed by adding explanation annotations to the OBQA dataset to test generalization of models trained on eQASC. We show that this data can be used to significantly improve explanation quality (+14% absolute F1 over a strong retrieval baseline) using a BERT-based classifier, but still behind the upper bound, offering a new challenge for future research. We also explore a delexicalized chain representation in which repeated noun phrases are replaced by variables, thus turning them into generalized reasoning chains (for example: "X is a Y" AND "Y has Z" IMPLIES "X has Z"). We find that generalized chains maintain performance while also being more robust to certain perturbations. 1

show abstract

Section: Results: Performance On Eqascmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Jhamtani¹,

Clark²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…One way we believe that can improve the distant supervision signals is by iteratively updating the ranker and reader like in Hard-EM (Min et al, 2019;Guu et al, 2020). Another possible direction is to extend the idea of inferring evidence on training data with game-theoretic approaches (Perez et al, 2019;Feng et al, 2020), then use the inferred evidence paragraph as labels to train the ranker.…”

Section: Discussion Of Future Improvementmentioning

confidence: 99%

Frustratingly Hard Evidence Retrieval for QA Over Books

Mou

Yang

et al. 2020

Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

View full text Add to dashboard Cite

A lot of progress has been made to improve question answering (QA) in recent years, but the special problem of QA over narrative book stories has not been explored in-depth. We formulate BookQA as an open-domain QA task given its similar dependency on evidence retrieval. We further investigate how state-ofthe-art open-domain QA approaches can help BookQA. Besides achieving state-of-the-art on the NarrativeQA benchmark, our study also reveals the difficulty of evidence retrieval in books with a wealth of experiments and analysis -which necessitates future effort on novel solutions for evidence retrieval in BookQA.

show abstract

“…Our work is the first to recover reasoning chains in a more general unsupervised setting, thus falling into the direction of denoising over distant supervised signals. From this perspective, the most relevant studies in the NLP field includes Wang, Yu, Guo, Wang, Klinger, Zhang, Chang, Tesauro, Zhou, and Jiang [21] and Min, Chen, Hajishirzi, and Zettlemoyer [22] for evidence identification in opendomain QA and Lei, Barzilay, and Jaakkola [5] and Perez, Karamcheti, Fergus, Weston, Kiela, and Cho [23] for rationale recovery.…”

Section: Related Workmentioning

confidence: 99%

Learning to Recover Reasoning Chains for Multi-Hop Question Answering via Cooperative Games

Feng

Yu²,

Xiong

et al. 2021

Proceedings of the Canadian Conference on Artificial Intelligence

View full text Add to dashboard Cite

We extend the formats of explanations in interpretable NLP with the proposed entity-centric reasoning chains for multi-hop question answering. We also propose a cooperative game approach to learn to recover such explanations from weakly supervised signals, i.e., the question-answer pairs. We evaluate our task and method via newly created benchmarks based on two multi-hop datasets, Hot-potQA and MedHop; and hand-labeled reasoning chains for the latter. The experiments demonstrate the effectiveness of our approach.

show abstract

Finding Generalizable Evidence by Learning to Convince Q&A Models

Cited by 26 publications

References 28 publications

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

Frustratingly Hard Evidence Retrieval for QA Over Books

Learning to Recover Reasoning Chains for Multi-Hop Question Answering via Cooperative Games

Contact Info

Product

Resources

About