Vertical fragmentation and access path selection are interdependent techniques in physical database design used to enhance performance in database systems. While vertical fragmentation in relational databases deals with assignment of attributes to physical files, access path selection deals with searching efficiently the physical location of data records. Vertical fragmentation is a combinatorial optimization problem that is NP-hard in most cases. We propose a genetic algorithm approach for the vertical fragmentation problem while addressing access path selection. The effectiveness and efficiency of the genetic algorithm are illustrated through several database design problems, ranging from 10 attributes/5 transactions to 30 attributes/18 transactions. In most cases, our design solutions match the global optimum solutions obtained from an exhaustive enumeration. Compared to unpartitioned databases, our design solution results in substantial savings (up to 80%) in the number of disk accesses.
For partially replicated distributed database systems to function efficiently, the data (relations) and operations (subquery) of the database need to be located, judiciously at various sites across the relevant communications network.The problem of allocating relations and operations to the most appropriate sites is a difficult one to solve so that genetic algorithms based on migration are proposed in this research. In partially replicated distributed database systems, the minimization of total time usually attempts to minimize resource consumption and therefore to maximize the system throughput. On the other hand, the minimization of response time may be obtained by having a large number of parallel executions to different sites, requiring a higher resource consumption, which means that the system throughput is reduced. Workload balancing implies the reduction of the average time that queries spend waiting for CPU and I/O service at a network site, but its effect on the performance of partially replicated distributed database systems cannot be isolated from other distributed database design factors. In this research, the total cost refers to the combination of total time and response time. This paper presents a framework for total cost minimization and workload balancing for partially replicated distributed database systems considering important database design objectives together. The framework incorporates both local processing, including CPU and I/O, and communication costs. To illustrate its suitability, experiments are conducted, and results demonstrate that the proposed framework provides effective partially replicated distributed database design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.