Abstract. We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/Oefficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms-given the developed external data structures.Key Words. I/O efficiency, Internal memory algorithms, Batched external data structures, Buffer tree.
Introduction.In recent years, increasing attention has been given to Input/Outputefficient (or I/O-efficient) algorithms. This is due to the fact that communication between fast internal memory and slower external memory such as disks is the bottleneck in many computations involving massive datasets. The significance of this bottleneck is increasing as internal computation gets faster and parallel computing gains popularity.Much work has been done on designing external versions of data structures designed for internal memory. Most of these structures are designed to be used in on-line settings, where queries should be answered immediately and within a good worst case number of I/Os. As a consequence they often do not take advantage of the available main memory, leading to suboptimal performance when they are used in solutions for batched (or offline) problems. Therefore several techniques have been developed for solving massive batched problems without using external data structures.In this paper we present a technique for designing external data structures that take advantage of the large main memory. We do so by only requiring good amortized performance and by allowing query operations to be batched. We also show how the developed data structures can be used in simple and I/O-efficient algorithms for a number of fundamental computational geometry and graph problems. Our technique has subsequently been used in the development of a large number of I/O-efficient algorithms in these and several other problem areas.