Data-aware scheduling in large-scale heterogeneous computing systems remains a challenging research issue, especially in the era of Big Data. Design of all data-related components of the popular distributed environments, such as Data Clouds (DCs), Data Grids (DGs) and Data Centers supports the processing, analysis and monitoring of the big data generated by various sources at computing centers by the end-users, devices and services. The above facts leave no doubts that data scheduling must be integrated in a single joint process together with the scheduling of computer tasks and applications. Therefore, many of the current optimization issues need to be changed and new requirements have to be considered in the scheduling process. This includes data transmission times, data processing times, availability of the data servers, safety and authentication in the data access processes. This paper presents a new version of the Expected Time to Compute Matrix model (ETC Matrix) for the case of data-aware independent batch scheduling in physical network in DGs and DCs environments. Simple geneticbased schedulers have been developed for experimental justification of the significance of the presented problem.
Abstract-Data-aware scheduling in today's large-scale computing systems has become a major complex research issue. This problem becomes even more challenging when data is stored and accessed from many highly distributed servers and energy-efficiency is treated as a main scheduling objective. In this paper we approach the independent batch scheduling in grid environment as a bi-objective minimization problem with makespan and energy consumption as the scheduling criteria. We used the Dynamic Voltage and Frequency Scaling (DVFS) model for reducing the cumulative power energy utilized by the system resources for tasks executions. We developed for data transmission a general logical network topology and policy based on the sleep link-based Adaptive Link Rate (ALR) on/off technique. Two developed energy-aware grid schedulers are based on genetic algorithms (GAs) frameworks with elitist and struggle replacement mechanisms and were empirically evaluated for four grid size scenarios in static and dynamic modes. The simulation results show that the proposed schedulers perform to a level that is sufficient to maintain the desired quality levels.
Scheduling in traditional distributed systems has been mainly studied for system performance parameters without data transmission requirements. With the emergence of Data Grids (DGs) and Data Centers, data-aware scheduling has become a major research issue. In this work we present two implementations of classical genetic-based data-aware schedulers of independent tasks submitted to the grid environment. The results of a simple empirical analysis confirm the high effectiveness of the genetic algorithms in solving very complex data intensive combinatorial optimization problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.