Abstract. In order to be considered as Linked Data, the datasets on the web must be linked to other datasets. We focus on predicting the possible links between datasets with the most important RDF link type, owl:sameAs using link prediction and classification techniques. Since the goal is to discriminate between linked dataset pairs against not-linked ones, we formulate the link prediction problem as a classification problem. We adopt Random Forest as the basic classifier to incorporate features of the scores output by unsupervised predictors, and apply the bagging technique to combine multiple forests to reduce variance and improve the accuracy. Experiments show we can improve the prediction performance by about 10% in AUROC compared with the best unsupervised predictor.
Dataset interlinking is a great important problem in Linked Data. We consider this problem from the perspective of information retrieval in this paper, thus propose a learning to rank based framework, which combines various similarity measures to retrieve the relevant datasets for a given dataset. Specifically, inspired by the idea of collaborative filtering, an effective similarity measure called collaborative similarity is proposed. Experimental results show that the collaborative similarity measure is effective for dataset interlinking, and the learning to rank based framework can significantly increase the performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.