The goal of on-line analytical processing (OLAP) is to quickly answer queries from large amounts of data residing in a data warehouse. Materialized view selection is an optimization problem encountered in OLAP systems. Published work on the problem of materialized view selection presents solutions scalable in the number of possible views. However, the number of possible views is exponential relative to the number of database dimensions. A truly scalable solution must be polynomial time relative to the number of dimensions. We present such a solution, our Polynomial Greedy Algorithm. Complexity analysis proves scalability, and a performance study verifies the result. Empirical evidence demonstrates benefits close to existing algorithms. We conclude the Polynomial Greedy Algorithm functions effectively where existing algorithms fail dramatically.
This paper describes the application of database technology to medical information with the goal of providing medical and clinical researchers with the tools necessary to plan bioinformatics projects. Commercial database management systems were utilized, standard database design practices were applied, a user interface was created, data entered, and the development of analysis tools, including data mining technologies is underway. Databases were constructed based on animal and cell culture models of diabetes and clinical data. Bioinformatics is a useful tool in both basic research and clinical settings. The advantages of relational databases and an approach to managing bioinformatics projects are discussed.
On-Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, pre-aggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multi-dimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against four published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.