Abstract. Due to the limitation of computing and storage resources, online analysis of massive data is usually time consuming. Data cube materialization is an effective way to improve the performance of online analysis. Considering the potential parallelism of genetic algorithm and its good global searching ability, a materialization method of data cube based on genetic algorithm is proposed. This method selects materialized views by combining the partial materialization strategy and MapReduce, while the materialized views can be adjusted adaptively according to the query log. Experimental results show that this method adapts to the big data computing environment and it can select reasonable materialized views to improve the query efficiency.Keywords: data cube, partial materialization, genetic algorithms, MapReduce, adaptive adjustment
IntroductionWith the rapid development of big data, many fields have urgent needs for online analytical processing of massive data. Data cube is a multidimensional data model, which is helpful for online analysis. It usually needs to be pre-computed and saved in disk in order to improve the efficiency of the queries. But materializing all data cube requires a large amount of storage space. Therefore, materialized view selection is a hot research area in data warehouse field. As the materialized view selection is a NPhard problem [1] and genetic algorithm is suitable for solving NP-hard problem, so we transform the problem of materialized view selection to finding optimal solution with genetic algorithm, and introduce MapReduce to improve the performance. Furthermore, an adaptive update method of materialized views according to the query log is presented to optimize the materialized view dynamically.