Scalable distributed indexing and query processing over Linked Data

Karnstedt, Marcel; Sattler, Kai-Uwe; Hauswirth, Manfred

doi:10.1016/j.websem.2011.11.010

Cited by 15 publications

(1 citation statement)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our approach takes advantage of SPARQL 1.1 update queries [4] and federated queries [4], which extends the remote access framework first introduced in [20] to multiple RDF stores. As opposed to federated query processing approaches for RDF data ( [21], [22], [23]), which focus on the problem of answering queries formulated in a general purpose query language from multiple RDF data sources, our focus in this paper is on answering restricted classes of statistical queries needed for learning classifiers from RDF data. Restricting the classes of queries to those that useful in the learning predictive models from RDF data allows us to take advantage of optimizations such as the efficient accumulation of projections (Sec.…”

Section: B Related Workmentioning

confidence: 99%

Learning Classifiers from Chains of Multiple Interlinked RDF Data Stores

Lin

Honavar

2013

2013 IEEE International Congress on Big Data

View full text Add to dashboard Cite

Abstract-The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data; (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain; (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case; (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner; and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.

show abstract

Section: B Related Workmentioning

confidence: 99%