Aside from enhancing data availability during disk failures, replication of data is also used to speed up I/O performance of read-intensive applications. There are two issues that need to be addressed: (a) data placement (Which disks should store the copies of each data block?) and (b) scheduling (Given a query Q, and a placement scheme P of the data, from which disk should each block in Q be retrieved so that retrieval time is minimized?) In this paper, we consider range queries and assume that the dataset is a multidimensional grid and r copies of each unit block of the grid must be stored among M disks. To accurately measure performance of a scheduling algorithm, we consider a metric that takes into account the scheduling overhead as well as the time it takes to retrieve the data blocks from the disks. We describe several combinations of data placement schemes and scheduling algorithms and analyze their performance for range queries with respect to the above metric. We then present simulation results for the most interesting case r = 2, showing that the strategies do perform better than the previously known method, especially for large queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.