The link traversal strategies to query Linked Data over WWW can retrieve up-to-date results using a recursive URI lookup process in real-time. The downside of this approach comes with the query patterns having subject unbound (i.e. ?S rdf:type:Class). Such queries fail to start up the traversal process as the RDF pages are subject-centric in nature. Thus, zero-knowledge link traversal leads to the empty query results for these queries. In this paper, the authors analyze a large corpus of real-world SPARQL query logs and identify the Most Frequent Predicates (MFPs) occurring in these queries. The knowledge of these MFPs helps in finding and indexing a limited number of triples from the original data set. Additionally, the authors propose a Hybrid Query Execution (HQE) approach to execute the queries over this index for initial data source selection followed by link traversal process to fetch complete results. The evaluation of HQE on the latest real data benchmarks reveals that it retrieves at least five times more results than the existing approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.