Abstract. This paper focuses on Real-Time Data Warehousing systems, a relevant class of Data Warehouses where the main requirement consists in executing classical data warehousing operations (e.g., loading, aggregation, indexing, OLAP query answering, and so forth) under real-time constraints. This makes classical DW architectures not suitable to this goal, and puts the basis for a novel research area which has tight relationship with emerging Cloud architectures. Inspired by this motivation, in this paper we proposed a novel framework for supporting Real-Time Data Warehousing which makes use of a rewrite/merge approach. We also provide an extensive experimental campaign that confirms the benefits deriving from our framework.
IntroductionData Warehouses (e.g., [11]) are more and more demanding for high-performance which allow them to deal with real-time paradigms (e.g., [16]), which may turn to be extremely useful in next-generation Big Data research. Indeed, there exists a plethora of emerging applications where Real-Time Data Warehousing plays a leading role, such as: sensor networks, real-time business intelligence, real-time Cloud applications, and so forth. The traditional data warehouse architecture model assumes that new data loading occurs only at certain times, when the warehouse is taken offline, and the data is integrated during a more or less lengthy time interval. This offline procedure is required for three main reasons: there should be no interference between the loading process and the query sessions running on the data warehouse. Therefore, there is no significant slowdown. Looking at data formats, a data warehouse is typically a set of interconnected data marts, schemas (stars), with constraints (e.g., foreign keys, not null constraints, primary keys), lots of indexes (e.g., b-trees, bitmap indexes), materialized views and other summary or derived data, which are created to speedup query answering. From the point of view of data integration, constraints and indexes considerably slow the process down, as well as the refreshing of all those structures with the new data. The appropriate solution for these problems in traditional warehouses is to take the whole warehouse offline, disable/drop the constraints and indexes that cause loading slowdown, load the whole data and refresh the datasets, and then rebuild the auxiliary structures and constraints (e.g., [10,12,13,14]).