Extract-Transform-Load (ETL) activities are software modules responsible for populating a data warehouse with operational data, which have undergone a series of transformations on their way to the warehouse. The whole process is very complex and of significant importance for the design and maintenance of the data warehouse. A plethora of commercial ETL tools are already available in the market. However, each one of them follows a different approach for the modeling of ETL activities; i.e., of the building blocks of an ETL workflow. As a result, so far there is no standard or unified approach for describing such activities. In this paper, we are working towards the identification of generic properties that characterize ETL activities. In doing so, we follow a black-box approach and provide a taxonomy that characterizes ETL activities in terms of the relationship of their input to their output and provide a normal form that is based on interpreted semantics for the black box activities. Finally, we show how the proposed taxonomy can be used in the construction of larger modules, i.e., ETL archetype patterns, which can be used for the composition and optimization of ETL workflows.
In this paper, we investigate the problem of answering top-k queries via materialized views. We provide theoretical guarantees for the adequacy of a view to answer a top-k query, along with algorithmic techniques to compute the query via a view when this is possible. We explore the problem of answering a query via a combination of more than one view and show that it is impossible to improve our theoretical guarantees for the answering of a query via a combination of views. Finally, we experimentally assess our approach for its effectiveness and efficiency.
In this paper we present results on the problem of maintaining materialized top-k views and provide results in two directions. The first problem we tackle concerns the maintenance of top-k views in the presence of high deletion rates. We provide a principled method that complements the inefficiency of the state of the art independently of the statistical properties of the data and the characteristics of the update streams. The second problem we have been concerned with has to do with the efficient maintenance of multiple top-k views in the presence of updates to their base relation. To this end, we provide theoretical guarantees for the nucleation (practically, inclusion) of a view with respect to another view and the reflection of this property to the management of updates. We also provide algorithmic results towards the maintenance of a large number of views, via their appropriate structuring in hierarchies of views.
CHAPTER 1. Introduction 1.1. Terminology and Contribution in a Nutshell 1.2. Thesis Contribution & Outline CHAPTER 2.View Usability for Answering Top-k Queries Over Materialized Views 2.1. Background and Related Work 2.1.1. Algorithms for top-k Queries over Relations 2.1.2. Algorithms for top-k Queries over a Relation and Materialized Views 2.1.3. Related Problems in Different Context 2.1.4. Research Opportunities and Comparison to Related Work 22 2.2. Adequacy of a Materialized View to Answer a Query for the 2D Case 2.2.1. Problem Formulation 2.2.2. The Case when the View is "Higher" than the Query 2.2.3. Strictness of the Suitability Theorem 2.2.4. Computation of Offsets and Safe Areas 2.2.5. The Case when the View is "Lower" than the Query 2.2.6. Special Cases 2.2.7. Algorithmic Results 2.3. Queries and Views with More than Two Scoring Attributes 2.3.1. Fundamental Results for the n-Dimensional Case 2.3.2. Discussion 2.3.3. Algorithmic Results 2.4. Working with More Than One Views 40 2.4.1. Safe Area Containment with More than One Views 2.4.2. Working with More than One Views in Parallel 2.5. Experiments 2.5.1. Experimental Method for 2D 2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.