Performing group-by before join [query processing]

Yan, Weipeng; Larson, Per-Åke

doi:10.1109/icde.1994.283001

Cited by 33 publications

(45 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the general case, special difficulties arise dealing with the relational group by (basis of roll-up, OLAP key operator). Interestingly, we want to remark the gain when dealing with our restricted group by (i.e., roll-up) instead of the generic one, whose difficulty is discussed in depth in [5] (where it is explicitly said that no law is stated) and specially in [19] (where the whole work is devoted to analyze all possibilities between join and group-by).…”

Section: Normalizing the Macmentioning

confidence: 99%

Describing Analytical Sessions Using a Multidimensional Algebra

Romero

Marcel

Abelló

et al. 2011

Data Warehousing and Knowledge Discovery

View full text Add to dashboard Cite

Abstract. Recent efforts to support analytical tasks over relational sources have pointed out the necessity to come up with flexible, powerful means for analyzing the issued queries and exploit them in decisionoriented processes (such as query recommendation or physical tuning). Issued queries should be decomposed, stored and manipulated in a dedicated subsystem. With this aim, we present a novel approach for representing SQL analytical queries in terms of a multidimensional algebra, which better characterizes the analytical efforts of the user. In this paper we discuss how an SQL query can be formulated as a multidimensional algebraic characterization. Then, we discuss how to normalize them in order to bridge (i.e., collapse) several SQL queries into a single characterization (representing the analytical session), according to their logical connections.

show abstract

Section: Normalizing the Macmentioning

confidence: 99%

Describing Analytical Sessions Using a Multidimensional Algebra

Romero

Marcel

Abelló

et al. 2011

Data Warehousing and Knowledge Discovery

View full text Add to dashboard Cite

show abstract

“…Interestingly, it became immediately apparent that prior work on partially or totally pushing group by operations past one or more join operations (also called eager aggregation transformation) [17,3,18,7,4] could be applied to these plans to partially group and aggregate tuples that are selected from the fact table. This transformation is not possible in traditional star schemas where no information about the hierarchies is encoded in the fact table.…”

Section: Introductionmentioning

confidence: 99%

Exploiting hierarchical clustering in evaluating multidimensional aggregation queries

Theodoratos¹

2003

Proceedings of the 6th ACM International Workshop on Data Warehousing and OLAP - DOLAP '03

View full text Add to dashboard Cite

Multidimensional aggregation queries constitute the single most important class of queries for data warehousing applications and decision support systems. The bottleneck in the evaluation of these queries is the join of the usually huge fact table with the restricted dimension tables (star-join). Recently, a multidimensional hierarchical clustering schema for star schemas is suggested. Subsequently, query evaluation plans for multidimensional queries appeared that essentially implement a star join as a multidimensional range restriction.We present a number of transformations for such plans. The transformations place grouping/aggregation operations before joins and safely prune aggregated tuples. They can be applied at no or minimal extra I/O cost. We show how these transformations can be used to construct a new evaluation plan for grouping/aggregation queries over multidimensional hierarchically clustered schemas. The new plan improves previous results by grouping and aggregating tuples and by excluding aggregated tuples from further consideration at an early stage of the computation of a query.

show abstract

“…In this algorithm, we do not materialize the join operation as in the traditional algorithms where the join operation is evaluated first and then the group-by and aggregate functions (Yan and Larson, 1994). So the Input/Output cost is minimal because we do not need to save the huge volume of data that results from the join operation.…”

Section: Introductionmentioning

confidence: 99%

“…But the response time of these queries is significantly reduced if the group-by operation is performed before the join (Chaudhuri and Shim, 1994), because group-by reduces the size of the relations thus minimizing the join and data redistribution costs. Several algorithms that perform the group-by operation before the join operation were presented in the literature (Shatdal and Naughton, 1995;Taniar et al, 2000;Taniar and Rahayu, 2001;Yan and Larson, 1994). In the "Early Distribution Schema" algorithm presented in (Taniar and Rahayu, 2001), all the tuples of the tables are redistributed before applying the join or the group-by operations, thus the communication cost in this algorithm is very high.…”

Section: Introductionmentioning

confidence: 99%

“…In traditional algorithms that treat "GroupBy-Join" queries 1 , join operations are performed in the first step and then the group-by operation (Chaudhuri and Shim, 1994;Yan and Larson, 1994). But the response time of these queries is significantly reduced if the group-by operation is performed before the join (Chaudhuri and Shim, 1994), because group-by reduces the size of the relations thus minimizing the join and data redistribution costs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Parallel Processing of “Group-By Join” Queries on Shared Nothing Machines

Hassan

Bamha

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract:SQL queries involving join and group-by operations are frequently used in many decision support applications. In these applications, the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. The main drawbacks of the presented parallel algorithms that treat this kind of queries are that they are very sensitive to data skew and involve expensive communication and Input/Output costs in the evaluation of the join operation. In this paper, we present an algorithm that minimizes the communication cost by performing the group-by operation before redistribution where only tuples that will be present in the join result are redistributed. In addition, it evaluates the query without the need of materializing the result of the join operation and thus reducing the Input/Output cost of join intermediate results. The performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model which predicts a near-linear speed-up even for highly skewed data.

show abstract

Performing group-by before join [query processing]

Cited by 33 publications

References 5 publications

Describing Analytical Sessions Using a Multidimensional Algebra

Describing Analytical Sessions Using a Multidimensional Algebra

Exploiting hierarchical clustering in evaluating multidimensional aggregation queries

Parallel Processing of “Group-By Join” Queries on Shared Nothing Machines

Contact Info

Product

Resources

About