Big Data Warehouses differ substantially from traditional data warehouses in that their schema should be based on novel logical models that allow more flexibility than that the relational model does. Furthermore, their design methodology also requires new principles, such as automation and agile techniques, in order to gain both a fast realization and reaction to changes in business requirements. In the paper, we discuss the new features of big data warehouses and present an innovative design methodology based on the key-value model
This article describes how the evaluation of modern data warehouses considers new solutions adopted for facing the radical changes caused by the necessity of reducing the storage volume, while increasing the velocity in multidimensional design and data elaboration, even in presence of unstructured data that are useful for providing qualitative information. The aim is to set up a framework for the evaluation of the physical and methodological characteristics of a data warehouse, realized by considering the factors that affect the data warehouse's lifecycle when taking into account the Big Data issues (Volume, Velocity, Variety, Value, and Veracity). The contribution is the definition of a set of criteria for classifying Big Data Warehouses on the basis of their methodological characteristics. Based on these criteria, the authors defined a set of metrics for measuring the quality of Big Data Warehouses in reference to the design specifications. They show through a case study how the proposed metrics are able to check the eligibility of methodologies falling in different classes in the Big Data context.
Traditional data warehouse design methodologies are based on two opposite approaches. The one is data oriented and aims to realize the data warehouse mainly through a eengineering process of the well-structured data sources solely, while minimizing the involvement of end users. The other is requirement\ud
oriented and aims to realize the data warehouse only on the basis of business goals expressed by end users, with no regard to the information obtainable from data sources. Since these approaches are not able to address the problems that arise when dealing with big data, the necessity to adopt hybrid methodologies, which allow the definition of multidimensional schemas by considering user requirements\ud
and reconciling them against non-structured data sources, has emerged. As a counterpart, hybrid methodologies may require a more complex design process. For this reason, the current research is devoted to introducing automatisms in order to reduce the design efforts and to support the designer in the big data warehouse creation. In this chapter, the authors present a methodology based on a hybrid approach that adopts a graph-based multidimensional model. In order to automate the whole design process, the methodology has been implemented using logical programming
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.