A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases

Englert, Susanne; Gray, Jim; Kocher, Terrye; Shah, Parul T.

doi:10.1145/98457.98766

Cited by 30 publications

(2 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Parallel, shared-nothing databases were proposed in [35], implemented in [8,11,15], and surveyed in [12].The requirements for array data management were researched in [3,19,36]. Several efforts are underway to implement scientific databases [5,6,9], but none of these optimize join processing over skewed data.…”

Section: Related Workmentioning

confidence: 99%

Skew-Aware Join Optimization for Array Databases

Duggan

Papaemmanouil

Battle

2015

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Science applications are accumulating an ever-increasing amount of multidimensional data. Although some of it can be processed in a relational database, much of it is better suited to array-based engines. As such, it is important to optimize the query processing of these systems. This paper focuses on efficient query processing of join operations within an array database. These engines invariably "chunk" their data into multidimensional tiles that they use to efficiently process spatial queries. As such, traditional relational algorithms need to be substantially modified to take advantage of array tiles. Moreover, most n-dimensional science data is unevenly distributed in array space because its underlying observations rarely follow a uniform pattern. It is crucial that the optimization of array joins be skew-aware. In addition, owing to the scale of science applications, their query processing usually spans multiple nodes. This further complicates the planning of array joins.In this paper, we introduce a join optimization framework that is skew-aware for distributed joins. This optimization consists of two phases. In the first, a logical planner selects the query's algorithm (e.g., merge join), the granularity of the its tiles, and the reorganization operations needed to align the data. The second phase implements this logical plan by assigning tiles to cluster nodes using an analytical cost model. Our experimental results, on both synthetic and real-world data, demonstrate that this optimization framework speeds up array joins by up to 2.5X in comparison to the baseline.

show abstract

Section: Related Workmentioning

confidence: 99%

Skew-Aware Join Optimization for Array Databases

Duggan

Papaemmanouil

Battle

2015

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…From a hardware viewpoint, the alternatives fall into two categories: those that use special-purpose hardware customized for database operations (see [2], for example) and those that construct a parallel database system in software running on general-purpose processors [3][4][5][6]. We chose general-purpose processors for four reasons.…”

Section: Alternativesmentioning

confidence: 99%

Exploiting database parallelism in a message-passing multiprocessor

et al. 1991

View full text Add to dashboard Cite

Parallel processing may well be the only means of satisfying the long-term performance requirements for database systems: an increase in throughput for transactions and a drastic decrease in response time for complex queries, in this paper, we review various alternatives, and then focus entirely on exploiting parallel-processing configurations in which general-purpose processors communicate only via message passing, in our configuration, the database Is partitioned among the processors. This approach iooi

~~show abstract~~

Load‐sharing algorithms for parallel database processing on shared‐everything multiprocessors
Hirano
¹
,
Satoh
²
,
Inoue
³

et al. 1993
Systems & Computers in Japan
0
0
0
0
View full text Add to dashboard Cite

SUMMARYThis paper describes new load-sharing algorithms for parallel database processing. There is a trade-off between overhead and load unbalance in ordinary algorithms. The proposed algorithms solve the tradeoff by varying the number of tasks allocated at one time, which is fixed in ordinary algorithms. Performance evaluations show that the proposed algorithms achieve fair load sharing with low overhead, independent of database size, the number of processors and data distribution.

~~show abstract~~

A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases

Cited by 30 publications

References 4 publications

Skew-Aware Join Optimization for Array Databases

Skew-Aware Join Optimization for Array Databases

Exploiting database parallelism in a message-passing multiprocessor

Load‐sharing algorithms for parallel database processing on shared‐everything multiprocessors

Contact Info

Product

Resources

About