SPARQL Query Optimization on Top of DHTs

Kaoudi, Zoi; Kyzirakos, Kostis; Koubarakis, Manolis

doi:10.1007/978-3-642-17746-0_27

Cited by 40 publications

(35 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to our work, [52] recently proposed databaseoriented query optimization strategies for RDF query processing on top of DHTs in the context of the Atlas system [53]. While one of our main goals is the optimization of the number of messages required for processing a query, in that work the authors explicitly focus on reducing the required bandwidth.…”

Section: Related Workmentioning

confidence: 97%

“…In contrast, UniStore focuses on less reliable systems and proposes to make extensive use of parallel processing strategies. [52] discusses very interesting and important extensions for our system, while on the other hand the Atlas system can benefit from the different processing strategies we discuss. An evaluation in a large local cluster of powerful machines shows that the idea of efficient RDF query processing in DHT systems can scale to millions of triples.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Scalable distributed indexing and query processing over Linked Data

Karnstedt

Sattler

Hauswirth

2012

Journal of Web Semantics

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 97%

Section: Related Workmentioning

confidence: 99%

Scalable distributed indexing and query processing over Linked Data

Karnstedt

Sattler

Hauswirth

2012

Journal of Web Semantics

View full text Add to dashboard Cite

“…A popular data partitioning algorithm for RDF data is hash partitioning [13], [14], [18]. This approach distributes RDF triples across different partitions by computing a hash key over the subject or the object of each triple.…”

Section: Related Workmentioning

confidence: 99%

“…A popular approach to partition RDF data is hash partitioning, which is adopted by a majority of the existing distributed RDF engines [13], [14], [18], [24]. This approach distributes RDF triples across different partitions by computing a hash key over either the subject or the object of each triple.…”

Section: Introduction Rdf (Resource Description Framework)mentioning

confidence: 99%

Scalable SPARQL querying using path partitioning

Zhou

Yuan

et al. 2015

2015 IEEE 31st International Conference on Data Engineering

View full text Add to dashboard Cite

Abstract-The emerging need for conducting complex analysis over big RDF datasets calls for scale-out solutions that can harness a computing cluster to process big RDF datasets. Queries over RDF data often involve complex self-joins, which would be very expensive to run if the data are not carefully partitioned across the cluster and hence distributed joins over massive amount of data are necessary. Existing RDF data partitioning methods can nicely localize simple queries but still need to resort to expensive distributed joins for more complex queries. In this paper, we propose a new data partitioning approach that takes use of the rich structural information in RDF datasets and minimizes the amount of data that have to be joined across different computing nodes. We conduct an extensive experimental study using two popular RDF benchmark data and one real RDF dataset that contain up to billions of RDF triples. The results indicate that our approach can produce a balanced and low redundant data partitioning scheme that can avoid or largely reduce the cost of distributed joins even for very complicated queries. In terms of query execution time, our approach can outperform the state-of-the-art methods by orders of magnitude.

show abstract

“…The bound-is-easier selection function is commonly used in recursive query evaluation, where the atom with the largest number of constants is evaluated first, in the hope of returning the smallest intermediate relation [14,16]. One can find extensions of this selection function in the literature, such as in [9] for a Semantic Web setting with binary predicates. There, the position of the bound argument is considered, where atoms with a bound first argument (subject) are preferred over those with a bound second argument (object) for two arguments with the same number of bindings.…”

Section: Sub-query Schedulingmentioning

confidence: 99%

D2R2: Disk-Oriented Deductive Reasoning in a RISC-Style RDF Engine

Yahya

Theobald

2011

Rule-Based Modeling and Computing on the Semantic Web

View full text Add to dashboard Cite

Abstract. Deductive reasoning lies in the expressive intersection of Datalog and Description Logics. In this paper, we present the D2R2 engine, which implements deductive reasoning capabilities based on the Query-Sub-Query (QSQR) algorithm on top of the disk-oriented RDF-3X engine. D2R2 aims to bridge the gap between rule-oriented (intensional) reasoning with deduction rules and data-oriented (extensional) processing of large joins, over a set of highly tuned, disk-based index structures for large RDF collections. We present a generalization of QSQR, which allows for dynamic sub-query scheduling and chaining of extensional predicates into atomic join patterns-two key extensions for coupling QSQR with a disk-oriented storage backend. Experiments over a set of recursive queries and a very large knowledge base, consisting of 20 million RDF facts, as well as comparisons to disk-oriented reasoning engines, confirm the practical viability and significant runtime improvements of D2R2 compared to these engines.

show abstract

SPARQL Query Optimization on Top of DHTs

Cited by 40 publications

References 27 publications

Scalable distributed indexing and query processing over Linked Data

Scalable distributed indexing and query processing over Linked Data

Scalable SPARQL querying using path partitioning

D2R2: Disk-Oriented Deductive Reasoning in a RISC-Style RDF Engine

Contact Info

Product

Resources

About