2014
DOI: 10.1007/978-3-319-11964-9_11
|View full text |Cite
|
Sign up to set email alerts
|

Sempala: Interactive SPARQL Query Processing on Hadoop

Abstract: Driven by initiatives like Schema.org, the amount of semantically annotated data is expected to grow steadily towards massive scale, requiring cluster-based solutions to query it. At the same time, Hadoop has become dominant in the area of Big Data processing with large infrastructures being already deployed and used in manifold application fields. For Hadoop-based applications, a common data pool (HDFS) provides many synergy benefits, making it very attractive to use these infrastructures for semantic data pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
56
0
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 56 publications
(57 citation statements)
references
References 17 publications
0
56
0
1
Order By: Relevance
“…This also applies to the intermediate and final results, which in turn facilitates the compositionality of expressions and provides a simple interoperability with, e.g., Hadoop-based SPARQL engines that can use Parquet as input [19]. We also performed some experiments with others storage formats including RCFile, Avro and SequenceFile.…”
Section: Rdf Data Layoutmentioning
confidence: 97%
“…This also applies to the intermediate and final results, which in turn facilitates the compositionality of expressions and provides a simple interoperability with, e.g., Hadoop-based SPARQL engines that can use Parquet as input [19]. We also performed some experiments with others storage formats including RCFile, Avro and SequenceFile.…”
Section: Rdf Data Layoutmentioning
confidence: 97%
“…Given a set of query templates, the query generator instantiates these templates with actual RDF terms from the dataset. We instantiated 20 of these templates each with 100 queries, so in total we got the 2000 unique queries, more details of the templates can be found on the Watdiv website 12 . Figure 4 shows the overall performance per query template type, while Figure 3 goes into more detail by showing the performance on the 20 query templates.…”
Section: Query Template Type Performancementioning
confidence: 99%
“…• Translating SPARQL and RDF to existing Big Data approaches such as MapReduce [11], Impala [12], Apache Spark [4];…”
Section: Introductionmentioning
confidence: 99%
“…S2RDF does not run on Spark directly; it translates SPARQL queries into SQL jobs which are then executed on top of Spark SQL [19]. S2RDF follows a similar approach to Sempala [50] and PigSPARQL [14]. Sempala is a distributed RDF engine that translates SPARQL into SQL which runs on top of Apache Impala [35].…”
Section: Sophisticated Partitioningmentioning
confidence: 99%
“…Each SPARQL query is decomposed into multiple subqueries, which are then evaluated independently. Since the data is [46] Subject Hash Distributed Semi-Join CliqueSquare [25] Hybrid (Hash + VP) MapReduce-based Join DREAM [38] No partitioning; full replication RDF-3X [53] EAGRE [56] METIS MapReduce-based Join gStoreD [45] Partitioning Agnostic gStore [37] H-RDF-3X [29] METIS RDF-3X [53] H2RDF+ [41] H-Base partitioner (range) Centralized + MapReduce HadoopRDF [30] VP + predicate files on HDFS MapReduce Join Partout [36] Workload-based fragmentation RDF-3X [53] PigSparql [14] Hash + Triple-based files SPARQL to PigLatin S2RDF [15] Extended Vertical Partitioning SPARQL to SQL S2X [51] GraphX partitioning strategy Vertex-Centric BGP matching Sedge [57] Subject Hash Vertex-Centric BGP matching Sempala [50] VP SPARQL to SQL SHAPE [32] Semantic Hash Partitioning RDF-3X [53] SHARD [47] Hash MapReduce-based Join TriAD [48] Hash-based Sharding Distributed Merge/Hash Joins TriAD-SG [48] METIS + Horizontal Sharding Distributed Merge/Hash Joins Trinity.RDF [33] Key-value store on graph Graph Exploration WARP [28] METIS on query workload RDF-3X [53] In this survey, we categorize distributed RDF management systems along 2 dimensions based on their execution model: (i) MapReduce and Graph-based systems: such systems rely on general purpose frameworks, i.e., Hadoop or Spark, that offer seamless data distribution and parallelization at the cost of flexibility. (ii) Specialized RDF systems: are built specifically for SPARQL query evaluation by utilizing custom physical layouts, native RDF indexing, efficient communication protocols and explicit replication.…”
Section: Distributed Rdf Systemsmentioning
confidence: 99%