From Theory to Practice

Chu, Shumo; Balazinska, Magdalena; Suciu, Dan

doi:10.1145/2723372.2750545

Cited by 77 publications

(9 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The hypercube hash join was initially presented in [20] and was used in the distributed RDF store presented in [28]. The basic idea is that for each join variable one dimension is created.…”

Section: Decentralized Joinmentioning

confidence: 99%

See 1 more Smart Citation

Storing and Querying Semantic Data in the Cloud

Janke

Staab

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In the last years, huge RDF graphs with trillions of triples were created. To be able to process this huge amount of data, scalable RDF stores are used, in which graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. The main challenges to be investigated for the development of such RDF stores in the cloud are: (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this manuscript, we give an overview of how these challenges are addressed by scalable RDF stores in the cloud. 8 We adapted the definition of an RDF molecule in [38] to allow for paths with a length ≥ 1. 9 The term anchor vertex was taken from [79]. 10 dom(µ) refers to the set of variables of this binding.

show abstract

Section: Decentralized Joinmentioning

confidence: 99%

“…Semantic Publishing Benchmark (SPB). The SPB 28 [74] is a benchmark motivated by the industry. The use case is a publisher organization that provides metadata about its published work.…”

Section: Benchmarksmentioning

confidence: 99%

Storing and Querying Semantic Data in the Cloud

Janke

Staab

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…This paper introduces multi-way joins in Squall (a multi-way join uses a single communication step, that is, it runs within a single component). These joins can outperform the corresponding pipelines of 2-way joins as they avoid shuffling intermediate data, which can be very large [8,74,26]. Multi-way joins are especially beneficial when the output of intermediate stages is big compared to the size of the base relations and/or final output.…”

Section: Novel Join Operatorsmentioning

confidence: 99%

“…We refer an interested reader to [18]. Unfortunately, as explained in [26], both works [8,18] do not handle the case when dimension sizes (obtained from solving the equations) are not integers. For instance, if we have 7 machines in total and 3 dimensions of the same size, each dimension is of size 7 1/3 = 1.91.…”

Section: Multi-way Joins: General Casementioning

confidence: 99%

Squall

et al. 2016

View full text Add to dashboard Cite

Squall is a scalable online query engine that runs complex analytics in a cluster using skew-resilient, adaptive operators. Squall builds on state-of-the-art partitioning schemes and local algorithms, including some of our own. This paper presents the overview of Squall, including some novel join operators. The paper also presents lessons learned over the five years of working on this system, and outlines the plan for the proposed system demonstration.

show abstract

“…This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation, one of the core tasks in massively parallel systems. Several papers [7,8,20,19] have studied the tradeoff between synchronization (number of rounds) and communication cost, and have proposed and analyzed known and new parallel algorithms [4,9]. Among these, the Hypercube algorithm [13,4] can compute any multiway join query in one round by properly distributing the input data.…”

Section: Introductionmentioning

confidence: 99%

Distribution Policies for Datalog

2019

View full text Add to dashboard Cite

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.

show abstract

From Theory to Practice

Cited by 77 publications

References 34 publications

Storing and Querying Semantic Data in the Cloud

Storing and Querying Semantic Data in the Cloud

Squall

Distribution Policies for Datalog

Contact Info

Product

Resources

About