Wen-tau Yih scite author profile

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dualencoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system greatly by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks. 1

show abstract

WikiQA: A Challenge Dataset for Open-Domain Question Answering

Yang¹,

Yih²,

Meek³

2015

678

574

View full text Add to dashboard Cite

We describe the WIKIQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WIKIQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WIKIQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WIKIQA dataset.

show abstract

Embedding Entities and Relations for Learning and Inference in Knowledge Bases

Yang¹,

Yih²,

He³

et al. 2014

Preprint

667

429

View full text Add to dashboard Cite

We consider learning representations of entities and relations in KBs using the neural-embedding approach. We show that most existing models, including NTN (Socher et al., 2013) and TransE (Bordes et al., 2013b), can be generalized under a unified learning framework, where entities are low-dimensional vectors learned from a neural network and relations are bilinear and/or linear mapping functions. Under this framework, we compare a variety of embedding models on the link prediction task. We show that a simple bilinear formulation achieves new state-of-the-art results for the task (achieving a top-10 accuracy of 73.2% vs. 54.7% by TransE on Freebase). Furthermore, we introduce a novel approach that utilizes the learned relation embeddings to mine logical rules such as BornInCitypa, bq ^CityInCountrypb, cq ùñ N ationalitypa, cq. We find that embeddings learned from the bilinear objective are particularly good at capturing relational semantics, and that the composition of relations is characterized by matrix multiplication. More interestingly, we demonstrate that our embedding-based rule extraction approach successfully outperforms a state-ofthe-art confidence-based rule mining approach in mining Horn rules that involve compositional reasoning. INTRODUCTIONRecent years have witnessed a rapid growth of knowledge bases (KBs) such as Freebase 1 , DBPedia (Auer et al., 2007), and YAGO (Suchanek et al., 2007). These KBs store facts about real-world entities (e.g. people, places, and things) in the form of RDF triples 2 (i.e. (subject, predicate, object)). Today's KBs are large in size. For instance, Freebase contains millions of entities and billions of facts (triples) involving a large variety of predicates (relation types). Such large-scale multirelational data provide an excellent potential for improving a wide range of tasks, from information retrieval, question answering to biological data mining.Recently, much effort has been invested in relational learning methods that can scale to large knowledge bases. Tensor factorization (e.g. (Nickel et al., 2011;) and neural-embedding-based models (e.g. (Bordes et al., 2013a;Socher et al., 2013)) are two popular kinds of approaches that learn to encode relational information using low-dimensional representations of entities and relations. These representation learning methods have shown good scalability and reasoning ability in terms of validating unseen facts given the existing KB.In this work, we focus on the study of neural-embedding models, where the representations are learned using neural networks with energy-based objectives. Recent embedding models TransE (Bordes et al., 2013b) and NTN (Socher et al., 2013) have shown state-of-the-art prediction performance compared to tensor factorization methods such as RESCAL (Nickel et al., 2012). They are similar in model forms with slight differences on the choices of entity and relation representations. Without careful comparison, it is not clear how different design choices affect the ˚Work conducted while interning at Mic...

show abstract

Dense Passage Retrieval for Open-Domain Question Answering

Karpukhin¹,

Oğuz²,

Min³

et al. 2020

Preprint

195

419

View full text Add to dashboard Cite

Cross-Sentence N-ary Relation Extraction with Graph LSTMs

Peng

Poon

Quirk

et al. 2017

TACL

453

382

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wen-tau Yih

Dense Passage Retrieval for Open-Domain Question Answering

WikiQA: A Challenge Dataset for Open-Domain Question Answering

Embedding Entities and Relations for Learning and Inference in Knowledge Bases

Dense Passage Retrieval for Open-Domain Question Answering

Cross-Sentence N-ary Relation Extraction with Graph LSTMs

Contact Info

Product

Resources

About