Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments

et al. 2018

Softw Pract Exp

Summary With the rapid development of cloud computing, many distributed data centers have been deployed. This means larger energy consumption requirements from the data center. How to reduce the cost of data center has received significant attention recently. Although there are several efforts in studying energy consumption of the data center, very few have considered modeling and analyzing cost‐aware job scheduling for the cloud data center. To address this emerging problem, we propose a systematic approach that considers both basic elements and their relationships in cloud data center. First, we present a formal language to describe the cloud data center, and a job scheduling net is proposed to formally model the basic elements such as user request, Web portal, data center, and server. Second, we minimize the total cost of the cloud data center by considering the multidimensional resource and local electricity price on the basis of the state space of constructed model. The dynamic job scheduling algorithm and its specific execution steps are proposed based on the alternating direction method of multipliers algorithm. Third, the operational semantics and related theories of Petri nets for establishing the correctness of our proposed method are presented. Finally, a series of simulations are performed to illustrate that the proposed method can guarantee the correct behavior of job scheduling in the cloud data center while meeting the required cost.

Section: Discussionmentioning

confidence: 91%

Formally modeling and analyzing cost‐aware job scheduling for cloud data center

Fan

et al. 2018

Softw Pract Exp

“…The power consumption model proposed in this paper is different from using a power analyzer to directly measure the power consumption of GPU, as we calculate the energy consumption of a computing node according to the work in [36]. The energy consumption , n t t − , as shown in Equation (2).…”

Section: Our Power Consumption Modelmentioning

confidence: 99%

RGCA: a Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

Fang

Xiong

et al. 2017

Preprint

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things computing. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSN. Then, using the CUDA Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.

“…For example, the au thors of [35] and [36] propose a sem anticsbased ap proach for the m anagem ent of fast d ata stream s, aim ing to provid e a d escription and m anagem ent layer to d efine and execu te stream processing pipelines. Besid es, to achieve high energy efficiency and low response tim e in big d ata stream com pu ting environm ents, the au thors of [2] and [3] propose a real-tim e and energy-efficient resou rce sched u ling and optim ization fram ew ork, term ed the Re-Stream , w hich aid s in calcu lating the energy consu m ption of a resou rce allocation schem e for a d ata stream graph. What is m ore, a partitioning-based d ataintensive w orkflow optim ization algorithm [37], [39] has been proposed to provid e significantly red u ced latency w ith increase in the throu ghpu t. H ow ever, the issu e of VM allocation has not been properly ad d ressed in geod istribu ted DCs for stream ing w orkflow.…”

Section: Streaming Workflow Optimizationmentioning

confidence: 99%

“…The increased volu m e of stream ing d ata and the d em and for m ore complex real-tim e analytics requ ire the execu tion of processing pipelines am ong heterogeneou s event-processing engines as a w orkflow [2]. H ow ever, in contrast to trad itional w orkflow execu tion , in w hich tasks execu te once or several tim es in case of control flow s like iterations, stream ing w orkflow s, w hich constantly resp ond to environm ental cond itions based on stream inpu ts, allow tasks in the w orkflow to be invoked m u ltiple tim es continu ou sly [3]; this involves the m ovem ent of hu ge am ou nts of d ata betw een execu tion nod es, w hich incu rs large costs. One exam ple is BigBench [4], in w hich the cross-d atacenter traffic is abou t 706 GB/ d ay and thu s raises the cost of provid ing services.…”

Section: Introductionmentioning

confidence: 99%

“…Unlike conventional w orkflow execu tion [2], [3], [5], stream ing w orkflow is characterized by the featu re that each task in the w orkflow is invoked m u ltiple tim es to process continu ou s instances of d ata stream s, and that the placem ent of the stream s in geo-d istribu ted DCs is also critical to the resu lting cost. This provid es a new opp ortu nity, but also a challenge, to optim ize the stream ing w orkflow allocation for cost m inim ization.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cost-Aware Streaming Workflow Allocation on Geo-Distributed Data Centers

Paik

2016

IEEE Trans. Comput.

The virtual machine (VM) allocation problem in cloud computing has been widely studied in recent years, and many algorithms have been proposed in the literature. Most of them have been successfully applied to batch processing models such as MapReduce; however, none of them can be applied to streaming workflow well because of the following weaknesses: 1) failure to capture the characteristics of tasks in streaming workflow for the short life cycle of data streams; 2) most algorithms are based on the assumptions that the price of VMs and traffic among data centers (DCs) are static and fixed. In this paper, we propose a streaming workflow allocation algorithm that takes into consideration the characteristics of streaming work and the price diversity among geo-distributed DCs, to further achieve the goal of cost minimization for streaming big data processing. First, we construct an extended streaming workflow graph (ESWG) based on the task semantics of streaming workflow and the price diversity of geo-distributed DCs, and the streaming workflow allocation problem is formulated into mixed integer linear programming based on the ESWG. Second, we propose two heuristic algorithms to reduce the computational space based on task combination and DC combination in order to meet the strict latency requirement. Finally, our experimental results demonstrate significant performance gains with lower total cost and execution time.