An overview of parallel strategies for transitive closure on algebraic machines

Cacace, Filippo; Ceri, Stefano; Houstma, Maurice A. W.

doi:10.1007/3-540-54132-2_49

Cited by 14 publications

(6 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, there has not been much work yet on investigating the tradeoffs using newer SQL-on-Hadoop solutions. We believe that, for our scenario, a semi-naive evaluation [8] is the better choice as it distributes the workload over more rounds and produce less derivations on graphs with cycles [3]. In contrast, a smart TC algorithm based on a nonlinear (recursivedoubling) execution [12] uses logarithmic, rather than linear, number of rounds but with much higher costs (with regard to the data-volume) per round [3].…”

Section: Query Compilermentioning

confidence: 98%

See 1 more Smart Citation

TriAL-QL

Przyjaciel-Zablocki

Schätzle

Lausen

2015

Proceedings of the 18th International Workshop on Web and Databases

View full text Add to dashboard Cite

Navigational queries are among the most natural query patterns for RDF data, but yet most existing RDF query languages fail to cover all the varieties inherent to its triplebased model, including SPARQL 1.1 and its derivatives. As a consequence, the development of more expressive RDF languages is of general interest. With TriAL* [14], there exists an expressive algebra which subsumes many previous approaches, while adding novel features that are not expressible in most other RDF query languages based on the standard graph model. However, its algebraic notation is inappropriate for practical usage and it is not supported by any existing RDF triple store. In this paper, we propose TriAL-QL, an easy to write and grasp language for TriAL*, preserving its compositional algebraic structure. We present an implementation based on Impala, a massive parallel SQL query engine on Hadoop, using an optimized semi-naive evaluation for the recursive fragments of TriAL*. This way, we support both data-intensive ETL-like workloads and explorative ad-hoc style queries. To demonstrate the scalability and expressiveness of our approach, we conducted experiments on generated social networks with up to 1.8 billion triples and compared different execution strategies to a Hivebased solution.

show abstract

Section: Query Compilermentioning

confidence: 98%

“…Fortunately, we can reduce such an expression to the problem of calculating the transitive closure (TC), which is a well-studied research field [8,12]. There is an ongoing debate whether the so-called semi-naive or smart TC algorithm is superior in distributed environments like MapReduce [2,20].…”

Section: Query Compilermentioning

confidence: 99%

TriAL-QL

Przyjaciel-Zablocki

Schätzle

Lausen

2015

Proceedings of the 18th International Workshop on Web and Databases

View full text Add to dashboard Cite

show abstract

“…At this point, we estimate that the maximum number of iterations has been reached and that the iteration terminates. This estimation relies on several assumptions, that are inspired by the so-called semi-naïve evaluation of transitive closures found in the literature [1,5,7,9]. In particular, we assume that only the new results generated by an iteration are used for the next iteration and that the number of tuples reduces until a maximum number of iterations N is reached.…”

Section: Fixpoint Operatormentioning

confidence: 99%

“…Recursive queries expresses a category of complex queries that involve iterative application of a function or operation until some condition is satisfied -known as the fixpoint. A variety of studies has been conducted on this class of queries including [5,9,11] and more recently [7,10,14]. One of the most difficult tasks in estimating the cost of a recursive query is determining the number of iterative steps needed for the iteration to converge.…”

Section: Introductionmentioning

confidence: 99%

A Cost Estimation Technique for Recursive Relational Algebra

Lawal

Genevès

Layaïda

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

With the increasing popularity of data structures such as graphs, recursion is becoming a key ingredient of query languages in analytic systems. Recursive query evaluation involves an iterative application of a function or operation until some condition is satisfied. It is particularly useful for retrieving nodes reachable along deep paths in a graph. The optimization of recursive queries has remained a challenge for decades. Recently, extensions of Codd's classical relational algebra to support recursive terms and their optimisation gained renewed interest [10]. Query optimization crucially relies on enumeration of query evaluation plans and on cost estimation techniques. Cost estimation for recursive terms is far from trivial, and received less attention. In this paper, we propose a new cost estimation technique for recursive terms of the extended relational algebra. This technique allows to select an estimated cheapest query plan, in terms of computing resources usage e.g. memory footprint, CPU and I/O and evaluation time. We evaluate the effectiveness of our cost estimation technique on a set of recursive graph queries on both generated and real datasets of significant size, including Yago: a graph with more than 62 millions edges and 42 million nodes. Experiments show that our cost estimation technique improves the performance of recursive query evaluation on popular relational database engines such as PostgreSQL. CCS CONCEPTS • Information systems → Database management system engines; • Theory of computation → Database theory; Database query processing and optimization (theory).

show abstract

“…Although in the context of PRISMA distributed transitive closure algorithms are very interesting, we will not go into this now. A good overview of parallel strategies for the transitive closure operation may be found in [8,9].…”

Section: Transitive Closurementioning

confidence: 99%

Algebraic optimization of recursive queries

Houtsma

Apers

1992

Data & Knowledge Engineering

View full text Add to dashboard Cite

Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAIog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations. The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems.

show abstract

An overview of parallel strategies for transitive closure on algebraic machines

Cited by 14 publications

References 22 publications

TriAL-QL

TriAL-QL

A Cost Estimation Technique for Recursive Relational Algebra

Algebraic optimization of recursive queries

Contact Info

Product

Resources

About