S2rdf

Schätzle, Alexander; Przyjaciel-Zablocki, Martin; Skilevic, Simon; Lausen, Georg

doi:10.14778/2977797.2977806

Cited by 112 publications

(25 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each SPARQL query is decomposed into multiple subqueries, which are then evaluated independently. Since the data is [46] Subject Hash Distributed Semi-Join CliqueSquare [25] Hybrid (Hash + VP) MapReduce-based Join DREAM [38] No partitioning; full replication RDF-3X [53] EAGRE [56] METIS MapReduce-based Join gStoreD [45] Partitioning Agnostic gStore [37] H-RDF-3X [29] METIS RDF-3X [53] H2RDF+ [41] H-Base partitioner (range) Centralized + MapReduce HadoopRDF [30] VP + predicate files on HDFS MapReduce Join Partout [36] Workload-based fragmentation RDF-3X [53] PigSparql [14] Hash + Triple-based files SPARQL to PigLatin S2RDF [15] Extended Vertical Partitioning SPARQL to SQL S2X [51] GraphX partitioning strategy Vertex-Centric BGP matching Sedge [57] Subject Hash Vertex-Centric BGP matching Sempala [50] VP SPARQL to SQL SHAPE [32] Semantic Hash Partitioning RDF-3X [53] SHARD [47] Hash MapReduce-based Join TriAD [48] Hash-based Sharding Distributed Merge/Hash Joins TriAD-SG [48] METIS + Horizontal Sharding Distributed Merge/Hash Joins Trinity.RDF [33] Key-value store on graph Graph Exploration WARP [28] METIS on query workload RDF-3X [53] In this survey, we categorize distributed RDF management systems along 2 dimensions based on their execution model: (i) MapReduce and Graph-based systems: such systems rely on general purpose frameworks, i.e., Hadoop or Spark, that offer seamless data distribution and parallelization at the cost of flexibility. (ii) Specialized RDF systems: are built specifically for SPARQL query evaluation by utilizing custom physical layouts, native RDF indexing, efficient communication protocols and explicit replication.…”

Section: Distributed Rdf Systemsmentioning

confidence: 99%

“…S2RDF [15] is a SPARQL engine built on top of Spark [39]. It proposes a relational partitioning technique for RDF data called Extended Vertical partitioning (ExtVP).…”

Section: Sophisticated Partitioningmentioning

confidence: 99%

“…Single-machine RDF systems, like RDF-3X [53] and gStore [37], do not scale well to complex queries on web-scale RDF data [29,33]. To overcome this problem, many distributed SPARQL query engines [29,33,47,41,32,48,30,28,36,25,46,18,15,38,16] have been introduced. They utilize shared-nothing computing clusters and are either built on top of distributed data processing frame-works, such as MapReduce, or implement proprietary distributed computation approaches.…”

Section: Introductionmentioning

confidence: 99%

“…Then, we perform extensive experimental evaluation of the following 12 representative systems: S2RDF [15], AdPart [46], DREAM [38], Urika-GD [11], CliqueSquare [25], S2X [51], TriAD [48], SHAPE [32], H-RDF-3X [29], H2RDF+ [41], SHARD [47] and gStoreD [45]. We use all the standard synthetic benchmarks (e.g., LUBM [6]) and a variety of very large real datasets (e.g., Bio2RDF [3]) with up to 4.3 billion triples, to stretch the systems to their limits.…”

Section: Introductionmentioning

confidence: 99%

“…If this condition is not satisfied, MapReduce based systems (e.g., H2RDF+ [41]), are an acceptable alternative. In contrast, the startup costs of some systems (e.g., S2RDF [15]) or the excessive replication (e.g., DREAM [38]), severely limit their applicability to large datasets. In an attempt to standardize the evaluation of future systems and assist practitioners to select the appropriate solution for their data and applications, we publish online all datasets, our evaluation methodology and links to the systems.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

et al. 2017

View full text Add to dashboard Cite

Distributed SPARQL engines promise to support very large RDF datasets by utilizing shared-nothing computer clusters. Some are based on distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive preprocessing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 22 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing and categorize them by several characteristics. Then, we select 12 representative systems and perform extensive experimental evaluation with respect to preprocessing cost, query performance, scalability and workload adaptability, using a variety of synthetic and real large datasets with up to 4.3 billion triples. Our results provide valuable insights for practitioners to understand the trade-offs for their usage scenarios. Finally, we publish online our evaluation framework, including all datasets and workloads, for researchers to compare their novel systems against the existing ones.

show abstract

Section: Distributed Rdf Systemsmentioning

confidence: 99%

“…S2RDF [15] is a SPARQL engine built on top of Spark [39]. It proposes a relational partitioning technique for RDF data called Extended Vertical partitioning (ExtVP).…”

Section: Sophisticated Partitioningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

et al. 2017

View full text Add to dashboard Cite

show abstract

Distributed SPARQL Query Processing: a Case Study with Apache Spark

Amann¹,

Curé²,

Naacke³

2018

NoSQL Data Models

View full text Add to dashboard Cite

Storing and Querying Semantic Data in the Cloud

Janke

Staab

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In the last years, huge RDF graphs with trillions of triples were created. To be able to process this huge amount of data, scalable RDF stores are used, in which graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. The main challenges to be investigated for the development of such RDF stores in the cloud are: (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this manuscript, we give an overview of how these challenges are addressed by scalable RDF stores in the cloud. 8 We adapted the definition of an RDF molecule in [38] to allow for paths with a length ≥ 1. 9 The term anchor vertex was taken from [79]. 10 dom(µ) refers to the set of variables of this binding.

show abstract

S2rdf

Cited by 112 publications

References 24 publications

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

Distributed SPARQL Query Processing: a Case Study with Apache Spark

Storing and Querying Semantic Data in the Cloud

Contact Info

Product

Resources

About