Abstract-In this article, we study the impact of big data's volume and variety dimensions on Energy Efficient Big Data Networks (EEBDN) by developing a Mixed Integer Linear Programming (MILP) model to encapsulate the distinctive features of these two dimensions. Firstly, a progressive energy efficient edge, intermediate, and central processing technique is proposed to process big data's raw traffic by building processing nodes (PNs) in the network along the way from the sources to datacenters. Secondly, we validate the MILP operation by developing a heuristic that mimics, in real time, the behaviour of the MILP for the volume dimension. Thirdly, we test the energy efficiency limits of our green approach under several conditions where PNs are less energy efficient in terms of processing and communication compared to data centers. Fourthly, we test the performance limits in our energy efficient approach by studying a "software matching" problem where different software packages are required to process big data. The results are then compared to the Classical Big Data Networks (CBDN) approach where big data is only processed inside centralized data centers. Our results revealed that up to 52% and 47% power saving can be achieved by the EEBDN approach compared to the CBDN approach, under the impact of volume and variety scenarios, respectively. Moreover, our results identify the limits of the progressive processing approach and in particular the conditions under which the CBDN centralized approach is more appropriate given certain PNs energy efficiency and software availability levels. Index Terms -Big data volume, big data variety, energy efficient networks, IP over WDM core networks, MILP, processing location optimization, software matching.
I IP WDM Wprocessing velocity of big data into two modes: expedited-data processing mode and relaxed-data processing mode. Expedited-data demands higher amount of computational resources to reduce the execution time compared to the relaxed-W M I L P MILP modes at strategic locations, dubbed processing nodes (PNs), built into the network along the path from the data source to the destination. During the processing of big data, the extracted information from the raw traffic is smaller in volume compared to the original big data traffic each time the data is processed, hence, reducing network power consumption. Our results showed that up to 60% network power saving is achieved when nearly 100% of the data in the network required relaxed-processing. In contrast, only 15% of network power saving is gained when nearly 100% of the data required expedited-processing. We obtained around 33% power saving in the mixed modes (i.e., when approximately 50% of the data is processed in the relaxed-mode and 50% of the data is processed in expedited-mode), compared to the classical approach where no PNs exist in the network and all the processing is achieved inside the centralized datacenters only. IntroductionVelocity is data in motion, which is the speed at which data is fluxing in and processed in the data centers [1]. The flux rate can grow larger for applications collecting information from wide spatial or temporal domains. For instance, the Square Kilometre Array [2] telescope combines signals with a flow speed of 700TB/second of data received from thousands of small antennas spread over a distance of more than 3000 km. In another example, five million trade events created each day are scrutinized in real time to identify potential fraud. Five hundred million daily call detail records are analysed in real-time to predict customer churn faster [3].High-speed processing of such immense data volumes as produced by plentiful data sources calls for new processing and communications methodologies in the big data era. In [4] the authors study the minimization of overall cost for Big Data placement, processing, and movement across geodistributed datacenters. In [5], the authors presented an optimization technique to execute a sequence of MapReduce jobs in Geo-distributed DCs to minimize the time and pecuniary cost. The authors in [6] introduced technique to execute MapReduce jobs on multiple IoT nodes to locally process as much data as possible the raw data. The authors in [7] aimed to minimize the communication cost by satisfying as many big data queries as possible over a number of time slots. In-network processing is proposed in [8] to achieve network-awareness to save more bandwidth using custom routing, redundancy elimination and on-path data reduction. In [9], the authors developed a Mixed Integer Linear programing models for energy efficient cloud computing services in IP over WDM core networks.We developed in [10] and [11] MILP models to investigate the impact of the big data's volume, variety, and veracity on gre...
Tremendous volumes generated by big data applications are starting to overwhelm data centers and networks. Traditional research efforts have determined how to process these vast volumes of data inside datacenters. Nevertheless, slight attention has addressed the increase in power consumption resulting from transferring these gigantic volumes of data from the source to destination (datacenters). An efficient approach to address this challenge is to progressively processing large volumes of data as close to the source as possible and transport the reduced volume of extracted knowledge to the destination. In this article, we examine the impact of processing different big data volumes on network power consumption in a progressive manner from source to datacenters. Accordingly, a noteworthy decrease for data transferred is achieved which results in a generous reduction in network power consumption. We consider different volumes of big data chunks. We introduce a Mixed Integer Linear Programming model (MILP) to optimize the processing locations of these volumes of data and the locations of two datacenters. The results show that serving different big data volumes of uniform distribution yields higher power saving compared to the volumes of chunks with fixed size. Therefore, we obtain an average network power saving of 57%, 48%, and 35% when considering the volumes of 10-220 (uniform) Gb, 110 Gb, and 50 Gb per chunk, respectively, compared to the conventional approach where all these chunks are processed inside datacenters only.
Classically the data produced by Big Data applications is transferred through the access and core networks to be processed in data centers where the resulting data is stored. In this work we investigate improving the energy efficiency of transporting Big Data by processing the data in processing nodes of limited processing and storage capacity along its journey through the core network to the data center. The amount of data transported over the core network will be significantly reduced each time the data is processed therefore we refer to such a network as an Energy Efficient Tapered Data Network. The results of a Mixed Integer linear Programming (MILP), developed to optimize the processing of Big Data in the Energy Efficient Tapered Data Networks, show significant reduction in network power consumption up to 76%.
This article introduces an energy efficient heuristic that performs resource provisioning and Virtual Machine (VM) migration in the Disaggregated Server (DS) schema. The DS is a promising paradigm for future data centers where servers' components are disaggregated at the hardware unit levels and resources of similar type are combined in type respective pools, such as processing pools, memory pools and IO pools. We examined 1000 VM requests that demand various processing, memory and IO requirements. Requests have exponentially distributed inter arrival time and with uniformly distributed service duration periods. Resources occupied by a certain VM are released when the VM finishes its service duration. The heuristic optimises VMs allocation and dynamically migrates existing VMs to occupy newly released energy efficient resources. We assess the energy efficiency of the heuristic by applying increasing service duration periods. The results of the numerical simulation indicate that our power savings can reach up to 55% when compared to our pervious study where VM service duration is infinite and resources are not released. Keywords: Virtual Machine, VM Migration, Disaggregated Server, data center, energy consumption. INTRODUCTIONNowadays, virtualization within cloud computing are becoming prevalent technologies that greatly shape our lives, and accordingly, the energy consumption of the physical infrastructure that provides resources for the clouds is growing [1]. Thus, energy management is becoming a key challenge for data centers to reduce all their energy consumption and their total costs. A number of energy efficient data centre and inter data centre network architectures were proposed and studied in [2]- [7].Serious concerns are raised currently regarding traditional server and networking design and significant research efforts were dedicated to improve the design of switches and servers, leading to new paradigms for future data centers designs [8]- [19].In , and different disaggregated server architectures were introduced by presenting the memory and IO ports as separate blades from the server block allowing resources to be disaggregated across a system to enable data centers vertical elasticity. In [17] and [18] the energy sufficiency potential of the DS architecture was studied considering the resource provisioning and VM allocation. In [19] a comprehensive review of the DS was introduced.A DS based data centre, shown in Fig. 1, is a departure from the conventional (Single box Server) to disaggregated resources combined in resource pools. With DS, the different server resources are disaggregated from each other and combined with resources of the same type in resources pool constructing CPU pools, memory pools and IO pools. However the communication among resources within the same pool (Intra Rack Fabric in Fig. 1) or resources from different pools (Inter Rack Fabric in Fig. 1) is an important point that must be managed efficiently.In this paper, we consider a DS-based data center as a new data cente...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.