Most of business intelligence applications use data warehousing solutions. The star schema or its variants modelling these applications are usually composed of hundreds of dimension tables and multiple huge fact tables. Referential horizontal partitioning is one of physical design techniques adapted to optimize queries posed over these schemes. In referential partitioning, a fact table can inherit the fragmentation characteristics from dimension table(s). Most of the existing works done on referential partitioning start from a bag containing selection predicates defined on dimension tables, partition each one based on its predicates and finally propagate their fragmentation schemes to the fact table. This procedure gives all dimension tables the same probability to partition the fact table which is not always true. In order to ensure a high performance of the most costly queries, the identification of relevant dimension table(s) to referential partition a fact table is a crucial issue that should be addressed. In this paper, we first study the complexity of the problem of selecting dimension table(s) used to partition a fact table. Secondly, we present strategies to perform their selection. Finally, to validate of our proposal, we conduct intensive experimental studies using a mathematical cost model and the obtained results are verified on Oracle11G DBMS.
Horizontal Partitioning has been largely adopted by the database community, where it took a significant part in the physical design process. Actually, it is supported by most commercial database systems (DBMS), where a native Data Definition Language for decomposing tables/materialized views using various modes is proposed. In traditional databases, horizontal partitioning has been largely studied, where several fragmentation algorithms were proposed to partition tables in isolation. In the relational data warehouse environment, horizontal partitioning consists in decomposing the whole warehouse schema into sub schemas, where each schema contains fragments of dimension and fact tables. Dimension tables are fragmented using the primary partitioning mode, whereas the fact table is divided using referential mode. In this article, the authors first focus on the evolution of horizontal partitioning in commercial DBMS motivated by decision support applications. Secondly, they give a formalization of the referential fragmentation schema selection problem in the data warehouse and they study its hardness to select an optimal solution. Due to its high complexity, they develop two algorithms: hill climbing and simulated annealing with several variants to select a near optimal partitioning schema. Finally, extensive experimental studies are conducted using the data set of APB1 benchmark to compare the quality the proposed algorithms using a mathematical cost model. Based on these experiments, some recommendations are given to advise database administrator for well using horizontal partitioning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.