The Linked Data paradigm builds upon the backbone of distributed knowledge bases connected by typed links. The mere volume of current knowledge bases as well as their sheer number pose two major challenges when aiming to support the computation of links across and within them. The first is that tools for link discovery have to be time-efficient when they compute links. Secondly, these tools have to produce links of high quality to serve the applications built upon Linked Data well. Solutions to the second problem build upon efficient computational approaches developed to solve the first and combine these with dedicated machine learning techniques. The current version of the Limes framework is the product of seven years of research on these two challenges. A series of machine learning techniques and efficient computation approaches were developed and integrated into this framework to address the link discovery problem. The framework combines these diverse algorithms within a generic and extensible architecture. In this article, we give an overview of version 1.7.4 of the open-source release of the framework. In particular, we focus on an overview of the architecture of the framework, an intuition of its inner workings and a brief overview of the approaches it contains. Some descriptions of the applications within which the framework was used complete the paper. Our framework is open-source and available under a GNU license at https://github.com/dice-group/LIMES together with a user manual and a developer manual.
The aim of the Mighty Storage Challenge (MOCHA) at ESWC 2017 was to test the performance of solutions for SPARQL processing in aspects that are relevant for modern applications. These include ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tested the systems against data derived from real applications and with realistic loads. An emphasis was put on dealing with data in form of streams or updates
Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.