Cartesian product files have recently been shown to exhibit attractive properties for partial match queries. This paper considers the file allocation problem for Cartesian product files, which can be stated as follows: Given a k-attribute Cartesian product file and an m-disk system, allocate buckets among the m disks in such a way that, for all possible partial match queries, the concurrency of disk accesses is maximis ed. The Risk Modulo (DM) allocation method is described first, and it is shown to be strict optimal under many conditions commonly occurring in practice, including all possible partial match queries when the number of disks is 2 or 3. It is also shown that although it has good performance, the DM allocation method is not strict optimal for all possible partial match queries when the number of disks is greater than 3. The General Disk Modulo (GDM) allocation method is then described, and a sufficient but not necessary condition for strict optimal&y of the GDM method for all partial match queries and any number of disks is then derived. Simulation studies comparing the DM and random allocation methods in terms of the average number of disk accesses, in response to various classes of partial match queries, show the former to be significantly more effective even when the number of disks is greater than 3, that is, even in cases where the DM method is not strict optimal. The results that have been derived formally and shown by simulation can be used for more effective design of optimal file systems for partial match queries. When considering multiple-disk systems with independent access paths, it is important to ensure that similar records are clustered into the same or similar buckets, while similar buckets should be dispersed uniformly among the disks.
A new type of dynamic file access called dynamic hushing has recently emerged. It promises the flexibility of handling dynamic tiles while preserving the fast access times expected from hashing. Such a fast, dynamic file access scheme is needed to support modern database systems. This paper surveys dynamic hashing schemes and examines their critical design issues.
Abstract.Since a file is usually stored on a disk, the response time to a query is dominated by the disk access time. In order to reduce the disk access time, a file can be stored on several independently accessible disks. In this paper, we discuss the problem of allocating buckets in a file among disks such that the maximum disk accessing concurrency can be achieved. We are particularly concerned with the disk allocation problem for binary Cartesian product files. A new allocation method is first proposed for the cases when the number (m) of available disks is a power of 2. Then it is extended to fit the cases where m is not a power of 2. The proposed algorithm has a "near" strict optimal performance for a partial match query in which the number of unspecified attributes is greater than a~ small number (5 or 6).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.