A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks

Tran, Xuan Thi; Do, Tien Van; Rotter, Csaba; Hwang, Dosam

doi:10.1007/s10723-018-9433-7

Cited by 9 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the three algorithms are designed for MapReduce related applications, which mean that they can be adapted to the other kinds of applications. Due to the new resource management and allocation framework (YARN) in the HDFS system, the default data layout schemes are not energy efficient, literature [121] proposed a new data layout scheme, which exploits the heterogeneity of the computing resource characteristics. Servers are sorted by three sets (termed the high-performance set, the energy-efficient set and the inefficient set).…”

Section: Energy Aware Data Layout Policiesmentioning

confidence: 99%

A Survey and Taxonomy on Energy-Aware Data Management Strategies in Cloud Environment

You

Zhen

et al. 2020

IEEE Access

View full text Add to dashboard Cite

During the past ten years, the energy consumption problem in cloud-related environments has attracted substantial attention in research and industrial communities. Researchers have conducted many surveys on energy efficiency issues from different perspectives. All of the surveys can be classified into five categories: surveys on the energy efficiency of the whole cloud related system, surveys on the energy efficiency of a certain level or component of the cloud, surveys on all of the energy efficient strategies, surveys on a certain energy efficiency techniques, and other energy efficiency related surveys. However, to the best of our knowledge, surveys on energy-aware data management strategies in cloud-related environment are absent. In this paper, we conduct a comprehensive survey on energy saving-aware data management strategies in cloud-related environments, such as data classification, data placement and data replication strategies. Compared to current existing reviews on energy efficiency in cloud-related environments, we firstly conduct the survey on the energy consumption problem from the data management perspective. Furthermore, we classify the energy-aware data management strategies from different perspectives. This survey and the taxonomy of the energy-aware data management strategies demonstrate the potential for reducing the energy consumption at the data management level of a cloud storage system, which will compress more space for energy reduction and finally achieve energy proportionality. Moreover, this survey and taxonomy on the energy efficiency issue from the data management perspective is an important supplement to current existing surveys on energy efficiency in cloud-related environments.

show abstract

Section: Energy Aware Data Layout Policiesmentioning

confidence: 99%

A Survey and Taxonomy on Energy-Aware Data Management Strategies in Cloud Environment

You

Zhen

et al. 2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Moreover, R. Yadav et al published the related article for minimizing energy consumption and SLA violation in cloud computing 19‐22 . Other data placement or data layout strategies for energy efficiency have been published in recent years, 15,19‐33 but the period access characteristics also have not been extracted for data clustering storing.…”

Section: Related Workmentioning

confidence: 99%

K‐ear: Extracting data access periodic characteristics for energy‐aware data clustering and storing in cloud storage systems

You

Sun

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Rapid increase in energy consumption is a serious problem in cloud storage systems. Data accessed in large‐scale storage systems usually exhibit temporal and spatial characteristics, which make it possible to reduce energy consumption by clustering data with similar access characteristics for storage in the same zone of cloud storage systems. Existing works usually only focus on the frequency of data access. However, widely existing phenomena show data access with seasonal and tidal characteristics in cloud storage systems. The seasonal and tidal characteristics of data access are extracted thoroughly in this paper. According to the extracted data access characteristics, energy‐aware data clustering through a machine learning algorithm (K‐ear) is proposed. K‐ear classifies data into five seasonal categories according to their seasonal access characteristics and then classifies every seasonal category into three tidal categories according to its tidal access characteristics. The 15 classified categories are stored in different storage zones with different energy and performance modes. Simulation experiments using CloudSimDisk with the constructed mathematic models demonstrate that the proposed K‐ear algorithm is more energy‐efficient than the default data clustering algorithms in Hadoop and the classical data clustering storage strategy according to the data access frequency (Striping‐Based Energy‐Aware Strategy).

show abstract

“…Tran et al [24] proposed a new data layout scheme that can be implemented for HDFS. The proposed layout algorithm assigns data blocks to the high-performance set and the energy-efficient set based on the data size, and keeps the replicas of data blocks in inefficient servers.…”

Section: Related Workmentioning

confidence: 99%

HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

et al. 2019

View full text Add to dashboard Cite

IntroductionDistributed and parallel processing is one of the best intelligent ways to store and compute big data [1]. Most definitions defined big data as characterized by the 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. MapReduce [2] is a programming model for big data processing. MapReduce programs are intrinsically parallel [3,4]. MapReduce executes the programs in two phases, map and reduce, so that each phase is defined by a function called mapper and reducer. A MapReduce framework consists of a master and multiple slaves. The master is responsible for the management of the framework, including user interaction, job queue organization and task scheduling. Each slave has a fixed number of map and reduce slots to perform tasks. The job scheduler located in the master assigns tasks according to the number of free task slots AbstractDue to the advent of new technologies, devices, and communication tools such as social networking sites, the amount of data produced by mankind is growing rapidly every year. Big data is a collection of large datasets that cannot be processed using traditional computing techniques. MapReduce has been introduced to solve largedata computational problems. It is specifically designed to run on commodity hardware, and it depends on dividing and conquering principles. Nowadays, the focus of researchers has shifted towards Hadoop MapReduce. One of the most outstanding characteristics of MapReduce is data locality-aware scheduling. Data locality-aware scheduler is a further efficient solution to optimize one or a set of performance metrics such as data locality, energy consumption and job completion time. Similar to all situations, time and scheduling are the most important aspects of the MapReduce framework. Therefore, many scheduling algorithms have been proposed in the past decades. The main ideas of these algorithms are increasing data locality rate and decreasing the response and completion time. In this paper, a new hybrid scheduling algorithm has been proposed, which uses dynamic priority and localization ID techniques and focuses on increasing data locality rate and decreasing completion time. The proposed algorithm was evaluated and compared with Hadoop default schedulers (FIFO, Fair), by running concurrent workloads consisting of Wordcount and Terasort benchmarks. The experimental results show that the proposed algorithm is faster than FIFO and Fair scheduling, achieves higher data locality rate and avoids wasting resources.

show abstract

A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks

Cited by 9 publications

References 17 publications

A Survey and Taxonomy on Energy-Aware Data Management Strategies in Cloud Environment

A Survey and Taxonomy on Energy-Aware Data Management Strategies in Cloud Environment

K‐ear: Extracting data access periodic characteristics for energy‐aware data clustering and storing in cloud storage systems

HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

Contact Info

Product

Resources

About