Instance matching is one of the processes that facilitate the integration of independently designed knowledge bases. It aims to link co-referent instances with an owl:sameAs connection to allow knowledge bases to complement each other. In this work, we present VDLS, an approach for automatic alignment of instances in RDF knowledge base graphs. VDLS generates for each instance a virtual document from its local description (i.e., data-type properties) and instances related to it through object-type properties (i.e., neighbors). We transform the instance matching problem into a document matching problem and solve it by a vector space embedding technique. We consider the pretrained word embeddings to assess words similarities at both the lexical and semantic levels. We evaluate our approach on multiple knowledge bases from the instance track of OAEI. The experiments show that VDLS gets prominent results compared to several state-of-the-art existing approaches.
Instance matching is one of the processes that facilitate the integration of independently designed knowledge bases. It aims to link co-referent instances with an owl:sameAs connection to allow knowledge bases to complement each other. In this work, we present VDLS, an approach for automatic alignment of instances in RDF knowledge base graphs. VDLS generates for each instance a virtual document from its local description (i.e., data-type properties) and instances related to it through object-type properties (i.e., neighbors). We transform the instance matching problem into a document matching problem and solve it by a vector space embedding technique. We consider the pretrained word embeddings to assess words similarities at both the lexical and semantic levels. We evaluate our approach on multiple knowledge bases from the instance track of OAEI. The experiments show that VDLS gets prominent results compared to several state-of-the-art existing approaches.
Summary
Instance matching (IM) is the process of matching instances across Knowledge Bases (KBs) that refer to the same real‐world object (eg, the same person in two different KBs). Several approaches in the literature were developed to perform this process using different algorithmic techniques and search strategies. In this article, we aim to provide the rationale for IM and to survey the existing algorithms for performing this task. We begin by identifying the importance of such a process and define it formally. We also provide a new classification of these approaches depending on the “source of evidence,” which can be considered as the context information integrated explicitly or implicitly in the IM process. We survey and discuss the state‐of‐the‐art IM methods regarding the context information. We, furthermore, describe and compare different state‐of‐the‐art IM approaches in relation to several criteria. Such a comprehensive comparative study constitutes an asset and a guide for future research in
IM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.