Parallel sorting on a shared-nothing architecture using probabilistic splitting

DeWitt, David J.; Naughton, Jeffrey F.; Schneider, Donovan A.

doi:10.1109/pdis.1991.183115

Cited by 86 publications

(81 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Merge-based sorting algorithms combine data from two or more processes [7]. Splitter-based approaches try to subdivide the input into chunks of roughly equal size [13,16,22,25,36]. The latter category utilize minimal data movement because the data only moves during the split operation.…”

Section: Data Skewmentioning

confidence: 99%

Distributed join algorithms on thousands of cores

et al. 2017

View full text Add to dashboard Cite

Traditional database operators such as joins are relevant not only in the context of database engines but also as a building block in many computational and machine learning algorithms. With the advent of big data, there is an increasing demand for efficient join algorithms that can scale with the input data size and the available hardware resources.In this paper, we explore the implementation of distributed join algorithms in systems with several thousand cores connected by a low-latency network as used in high performance computing systems or data centers. We compare radix hash join to sort-merge join algorithms and discuss their implementation at this scale. In the paper, we explain how to use MPI to implement joins, show the impact and advantages of RDMA, discuss the importance of network scheduling, and study the relative performance of sorting vs. hashing. The experimental results show that the algorithms we present scale well with the number of cores, reaching a throughput of 48.7 billion input tuples per second on 4,096 cores.

show abstract

Section: Data Skewmentioning

confidence: 99%

Distributed join algorithms on thousands of cores

et al. 2017

View full text Add to dashboard Cite

show abstract

“…DeWitt, Naughton, and Schneider's efforts on an Intel Hypercube was the fastest reported time: 58.3 seconds using 32 processors, 32 disks and 224 MB of memory [9]. Baugsto, Greispland and Kamberbeek mentioned a 40-second sort on a 100-processor 100-disk system [4].…”

Section: The Sort Benchmark and Prior Work On Sortmentioning

confidence: 99%

“…Despite this risk, QuickSort is widely used because, in practice, it has superior performance. Baugsto, Bitton, Beck, Graefe, and DeWitt used QuickSort [4,6,7,9,11]. On the other hand, Tsukerman and Weinberger used replacementselection [21,22].…”

Section: Minimizing Cache-miss Waitsmentioning

confidence: 99%

Alphasort: A cache-sensitive parallel external sort

Nyberg¹,

Barclay

Cvetanovic

et al. 1995

VLDB Journal

View full text Add to dashboard Cite

“…Early work concentrated on parallelizing individual, traditional content-sensitive operators like hybrid-hash join [25] and sort (e.g., [11,20,1]). The abstractions which inspired Flux, Exchange [12] and RiverDQ [23], were proposed to compose such operators into a dataflow.…”

Section: Related Workmentioning

confidence: 99%

“…The abstractions which inspired Flux, Exchange [12] and RiverDQ [23], were proposed to compose such operators into a dataflow. In [10] and [9], the authors present practical techniques for handling data skew for a hash join and external sort, respectively. These techniques rely on sampling a static data set, which is infeasible in the streaming scenario.…”

Section: Related Workmentioning

confidence: 99%