[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems
DOI: 10.1109/pdis.1991.183115
|View full text |Cite
|
Sign up to set email alerts
|

Parallel sorting on a shared-nothing architecture using probabilistic splitting

Abstract: We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by e a c h processor. We consider two t e c hniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by I y er, Ricard, and Varman and probabilistic splitting, which uses sampling to estimate quantiles. We present analytic results showing that probabilistic splitting perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
74
0

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 86 publications
(81 citation statements)
references
References 15 publications
1
74
0
Order By: Relevance
“…Merge-based sorting algorithms combine data from two or more processes [7]. Splitter-based approaches try to subdivide the input into chunks of roughly equal size [13,16,22,25,36]. The latter category utilize minimal data movement because the data only moves during the split operation.…”
Section: Data Skewmentioning
confidence: 99%
“…Merge-based sorting algorithms combine data from two or more processes [7]. Splitter-based approaches try to subdivide the input into chunks of roughly equal size [13,16,22,25,36]. The latter category utilize minimal data movement because the data only moves during the split operation.…”
Section: Data Skewmentioning
confidence: 99%
“…DeWitt, Naughton, and Schneider's efforts on an Intel Hypercube was the fastest reported time: 58.3 seconds using 32 processors, 32 disks and 224 MB of memory [9]. Baugsto, Greispland and Kamberbeek mentioned a 40-second sort on a 100-processor 100-disk system [4].…”
Section: The Sort Benchmark and Prior Work On Sortmentioning
confidence: 99%
“…Despite this risk, QuickSort is widely used because, in practice, it has superior performance. Baugsto, Bitton, Beck, Graefe, and DeWitt used QuickSort [4,6,7,9,11]. On the other hand, Tsukerman and Weinberger used replacementselection [21,22].…”
Section: Minimizing Cache-miss Waitsmentioning
confidence: 99%
“…Early work concentrated on parallelizing individual, traditional content-sensitive operators like hybrid-hash join [25] and sort (e.g., [11,20,1]). The abstractions which inspired Flux, Exchange [12] and RiverDQ [23], were proposed to compose such operators into a dataflow.…”
Section: Related Workmentioning
confidence: 99%
“…The abstractions which inspired Flux, Exchange [12] and RiverDQ [23], were proposed to compose such operators into a dataflow. In [10] and [9], the authors present practical techniques for handling data skew for a hash join and external sort, respectively. These techniques rely on sampling a static data set, which is infeasible in the streaming scenario.…”
Section: Related Workmentioning
confidence: 99%