A number of ETL procedures are used in the process of loading data to data warehouse systems. Some procedures can be executed concurrently in parallel mode, while for the others there are precedence constraints. Thus, the problem in scheduling procedures for execution is similar to the problem of scheduling of jobs in multiprocessor systems. The solution to this problem has been proposed in the optimum schedule of jobs minimizing the total execution time. When optimizing the schedule for ETL procedures, minimization of the total execution time is not the primary goal. Namely, the ETL procedures provide data required for reports aimed for business users and such reports need to be prepared until the user-defined deadlines. If the deadlines are not breached, the solution is satisfactory, regardless of the total execution time. Also, one cannot assume that all ETL processes are of the same importance – some have higher priorities than the others. That is the reason why prioritization and introduction of explicit bounds to completion time for individual ETL processes is attempted with genetic algorithm (GA). This paper encompasses implementation of the algorithm, experiments with different parameters and testing the quality of obtained solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.