In recent years, the Web has evolved from a global information space of interlinked documents to a space where both documents and data are linked. To integrate and share data, instance matching has been become the fundamental issue especially with the rapid development of linked data. In this paper, we propose an instance matching approach based on two main processes: the former is based on property classification (IM_PC) and the later is based on ViewSameAs link (IM_VSA). To accelerate greatly the matching process, IM_PC determines at first the matching candidate by comparing the discriminative property values. Then, the refinement result is done by comparing the description property values. In IM_PC two links are established: identity SameAs link and a novel proposed link ViewSameAs that aims to keep track of instances which share similar discriminative property values. In instance matching, another problem should be addressed when instances may have different descriptions even if their meanings are similar. So, this problem is addressed in IM_VSA process. The aim of this later is trying to get more identity link SameAs by Clustering instances matched with ViewSameAs. The Clustered instances are modeled as bags.
Recently, instance matching has become a key technology to achieve interoperability over datasets, especially in linked data. Due the rapid growth of published datasets, it attracts increasingly more research interest. In this context, several approaches have been proposed. However, they do not perform well since the problem of matching instances that possess different descriptions is not addressed. On the other hand, the usage of the identity link owl:sameAs is generally predominant in linking correspondences. Unfortunately, many existing identity links are misused. In this paper, the authors discuss these issues and propose an original instance matching approach aiming to match instances that hold diverse descriptions. Furthermore, a novel link named ViewSameAs is proposed. The key improvement compared to existing approaches is alignment reuse. Thus, two novel methods are introduced: ViewSameAs-based clustering and alignment reuse based on metadata. Experiments on datasets by considering those of OAEI show that the proposed approach achieves satisfying and highly accuracy results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.