Asynchronous parallel disk sorting

Dementiev, Roman; Sanders, Peter

doi:10.1145/777412.777435

Cited by 36 publications

(29 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second kind of sorting algorithms on the PDM are based on R-way merging for some suitable value of R that minimizes the number of passes through the data for the given size of internal memory [1,6,11,16,18].…”

Section: Prior Algorithms and Our Resultsmentioning

confidence: 99%

See 1 more Smart Citation

A Simple Optimal Randomized Algorithm for Sorting on the PDM

Rajasekaran

Sen

2005

Algorithms and Computation

View full text Add to dashboard Cite

Abstract. The Parallel Disks Model (PDM) has been proposed to alleviate the I/O bottleneck that arises in the processing of massive data sets. Sorting has been extensively studied on the PDM model due to the fundamental nature of the problem. Several randomized algorithms are known for sorting. Most of the prior algorithms suffer from undue complications in memory layouts, implementation, or lack of tight analysis. In this paper we present a simple randomized algorithm that sorts in optimal time with high probablity and has all the desirable features for practical implementation.

show abstract

Section: Prior Algorithms and Our Resultsmentioning

confidence: 99%

“…The RC scheduling resulted in optimal distributed sort (RCD) and optimal mergesort (RCM) via duality [13]. These have been shown to be very practical [11]. However, these algorithms have been shown to be optimal only in expectation and no high probability bounds have been derived.…”

Section: Prior Algorithms and Our Resultsmentioning

confidence: 99%

A Simple Optimal Randomized Algorithm for Sorting on the PDM

Rajasekaran

Sen

2005

Algorithms and Computation

View full text Add to dashboard Cite

show abstract

“…Our experimental platform has two 2.0 GHz Intel Xeon processors, one GByte of RAM, and we use four 80 GByte IBM 120GXP disks. Refer to [11] for a performance evaluation of this machine whose cost was 2500 Euro in July 2002. The following instances have been considered: Random2: Two concatenated copies of a Random string of length n/2.…”

Section: Methodsmentioning

confidence: 99%

Better external memory suffix array construction

Dementiev

Kärkkäinen

Mehnert

et al. 2008

ACM J. Exp. Algorithmics

Self Cite

View full text Add to dashboard Cite

Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. However, so far it has looked prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst case or on the average. Our implementation can construct suffix arrays for inputs of up to 4GBytes in hours on a low cost machine.As a tool of possible independent interest we present a systematic way to design, analyze, and implement pipelined algorithms.

show abstract

“…Both sorters are highly efficient parallel disk implementations. The algorithm they implement guarantees close to optimal I/O volume and almost perfect overlapping between I/O and computation [16]. The performance of the sorters scales well.…”

Section: Algorithmsmentioning

confidence: 96%

“…The input of the sorter may be an object complying to Stxxl stream interface. As the STL-user layer sorter, the pipelined sorter is an implementation of parallel disk merge sort [16] that overlaps I/O and computation. The implementation of stream::sort relies on two classes that encapsulate the two phases of the algorithm: sorted run formation (class runs creator) and run merging (runs merger).…”

Section: Streaming Layermentioning

confidence: 99%

Stxxl: Standard Template Library for XXL Data Sets

Dementiev¹,

Kettner

Sanders³

2005

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We present a software library Stxxl, that enables practice-oriented experimentation with huge data sets. Stxxl is an implementation of the C++ standard template library STL for external memory computations. It supports parallel disks, overlapping between I/O and computation and is the first external memory algorithm library that supports the pipelining technique that can save more than half of the I/Os. Stxxl has already been used for the following applications: implementations of external memory algorithms for computing minimum spanning trees, connected components, breadth-first search decompositions, constructing suffix arrays, and computing social network analysis metrics for huge graphs.

show abstract

Asynchronous parallel disk sorting

Cited by 36 publications

References 25 publications

A Simple Optimal Randomized Algorithm for Sorting on the PDM

A Simple Optimal Randomized Algorithm for Sorting on the PDM

Better external memory suffix array construction

Stxxl: Standard Template Library for XXL Data Sets

Contact Info

Product

Resources

About