We admire the emerging technologies that fascinate us, as it has become part of our daily life. Internet of Things (IoT) plays a major role in simplifying human effort. It leaps forward taking the advantages of latest wireless devices and communication technologies. IoT is a combination of technologies such as ubiquitous and pervasive computing, wireless communication devices and sensors, Internet protocol, and others. IoT logically interconnects and interoperates physical objects (sensors, wired/wireless communication devices) and virtual objects (web applications, virtual machines) over existing Internet infrastructure. IoT collects and records heterogeneous data (such as documents, images, videos, audios, and others) from heterogeneous applications (such as CCTV, medical images, barcode reader, and others) with the help of Internet. People, physical objects, and virtual objects are logically connected to the network to observe and analyze for decision‐making. Therefore, IoT has transformed to be an important evolving technology and inevitable in every sectors.
Summary Big data is largely influencing business entities and research sectors to be more data‐driven. Hadoop MapReduce is one of the cost‐effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on‐demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic‐based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2‐dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic‐based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
Improving the performance of the MapReduce scheduler is a primary objective, especially in a heterogeneous virtual cloud environment. A map task is assigned with an input split(IS) which consists of one or more data blocks. When a map task is assigned to more than one data block, non-local execution is performed. In classical MapReduce scheduling schemes, data blocks are copied over the network to a node in where the map task is running. This increases job latency and consumes more network bandwidth within and between racks in the cloud data-center. Considering this situation, we propose a methodology "improving data locality using ant colony optimization (IDLACO)" to minimize the number of non-local executions and virtual network bandwidth consumption when IS are assigned to more than one data block. First IDLACO determines a list of an optimal number of data blocks for each map task of a job to perform a non-local execution reducing the job latency and virtual network consumption. Then, the target virtual machine to execute the map task is determined on the basis of its heterogeneous performance. Finally, if a set of data blocks is transferred to the same node for repeated job execution, it is decided to temporarily cache those data block in the target virtual machine. The performance of IDLACO is analysed and compared with fair scheduler and Holistic scheduler based on the parameters, such as the number of non-local executions, average map task latency, job latency, and amount of bandwidth consumed for a MapReduce job. Results show that our proposed IDLACO significantly outperforms the classical fair scheduler and Holistic scheduler.
Consuming Hadoop MapReduce via virtual infrastructure as a service is becoming common practice as cloud service providers (CSP) offers relevant applications and scalable resources. One of the predominant requirements of cloud users is to improve resource utilization in the virtual cluster during the service period. However, it may not be possible when MapReduce workloads and virtual machines (VM) are highly heterogeneous. Therefore, in this paper, we addressed these heterogeneities and proposed an efficient MapReduce scheduler to improve resource utilization by placing the right combination of the map and reduce tasks in each VM in the virtual cluster. To achieve this, we transformed the MapReduce task scheduling problem into a 2-Dimensional (2D) bin packing model and obtained an optimal schedule using the ant colony optimization (ACO) algorithm. As an added advantage, our proposed ACO based bin packing (ACO-BP) scheduler minimized the makespan for a batch of jobs. To showcase the performance improvement, we compared our proposed scheduler with three existing schedulers that work well in a heterogeneous environment. As expected, results show that ACO-BP significantly outperformed the existing schedulers while dealing with workload and VM level heterogeneities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.