In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to minimize view maintenance and query processing costs. Some existing approaches are applicable only for small problems, which are far from reality. In this paper we introduce a new approach for materialized view selection using Parallel Simulated Annealing (PSA) that selects views from an input Multiple View Processing Plan (MVPP). With PSA, we are able to perform view selection on MVPPs having hundreds of queries and thousands of views. Also, in our experimental study we show that our method provides a significant improvement in the quality of the obtained set of materialized views over existing heuristic and sequential simulated annealing algorithms.
Abstract. In this paper, we describe the MaxStream federated stream processing architecture to support real-time business intelligence applications. MaxStream builds on and extends the SAP MaxDB relational database system in order to provide a federator over multiple underlying stream processing engines and databases. We show preliminary results on usefulness and performance of the MaxStream architecture on the SAP Sales and Distribution Benchmark.
In the last decade, Stream Processing Engines (SPEs) have emerged as a new processing paradigm that can process huge amounts of data while retaining low latency and high-throughputs. Yet, it is often necessary to join streaming data with traditional databases to provide more contextual information for the end-users and applications. The major problem that we confront is to join the fast arriving stream tuples with the static relation tuples that are on a slow database. This is what we call the Stream-Relation Join (SRJ) problem. Currently, SPEs use a naive tuple-by-tuple approach for SRJ processing where the SPE accesses the database for every incoming tuple. Some SPEs use cache to avoid accessing the database for every incoming tuple, while others do not because of the stochastic nature of streaming data. In this paper, we propose a new SRJ operator to facilitate SRJ processing regardless of the cache performance using two techniques: batching and out-of-order processing. The proposed operator provides an effective generic solution to the SRJ problem and the cost of incorporating our operator into different SPEs is minimal. Our experiments use a variety of synthetic and real data sets demonstrating that our operator outperforms the state-of-the-art tuple-by-tuple approach in terms of maximizing the throughput under ordering and memory constraints.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.