2017 19th International Conference on Transparent Optical Networks (ICTON) 2017
DOI: 10.1109/icton.2017.8024733
|View full text |Cite
|
Sign up to set email alerts
|

On the energy efficiency of MapReduce shuffling operations in data centers

Abstract: This paper aims to quantitatively measure the impact of different data centers networking topologies on the performance and energy efficiency of shuffling operations in MapReduce. Mixed Integer Linear Programming (MILP) models are utilized to optimize the shuffling in several data center topologies with electronic, hybrid, and all-optical switching while maximizing the throughput and reducing the power consumption. The results indicate that the networking topology has a significant impact on the performance of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 31 publications
0
12
0
Order By: Relevance
“…Indy GraySort benchmark [20] is selected as it is a representative workload for examining the congestion in DCNs due to routing intermediate data that is equal in size to the input data from the 10 map workers to the 6 reduce workers. The power consumption values of the selected electronic and optical equipment are as in [12], and [15]. Figure 2 provides the shuffling completion time results for a maximum server rate of 1000 Mbytes/s and input data ranging from 1 to 20 GBytes in different DCNs.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Indy GraySort benchmark [20] is selected as it is a representative workload for examining the congestion in DCNs due to routing intermediate data that is equal in size to the input data from the 10 map workers to the 6 reduce workers. The power consumption values of the selected electronic and optical equipment are as in [12], and [15]. Figure 2 provides the shuffling completion time results for a maximum server rate of 1000 Mbytes/s and input data ranging from 1 to 20 GBytes in different DCNs.…”
Section: Methodsmentioning
confidence: 99%
“…Figure 2 provides the shuffling completion time results for a maximum server rate of 1000 Mbytes/s and input data ranging from 1 to 20 GBytes in different DCNs. The subfigures, show the results in [12], and [13] for the cases where all links are working, in addition to the completion time results under 2 or 3 cases of links failures in each DCN. For the Spine-and-Leaf architecture, the degradation due to links 1-3 disconnection is less than 2-6 because the latter is more utilized due to serving the flows to 3 reduce workers compared to 1 reduce worker by links 1-3.…”
Section: Methodsmentioning
confidence: 99%
“…To this end, we choose a set of representative shuffle sizes, measure the shuffle overhead per reducer using the synthetic benchmark, and then apply linear regression to obtain two approximation functions: one for the average (using all reducers) and one for the maximum (using only the slowest reducer for each shuffle size). Note that we have chosen linear regression because it was confirmed by previous work as a good approximation for shuffle behavior [22], [33], [34] for a variety of network topologies with different performance characteristics. In our case, we assume that the weak link will slow data transfers uniformly, thus preserving the linear behavior.…”
Section: Calibration Using Synthetic Benchmarkingmentioning
confidence: 99%
“…As with cloud networking infrastructures, the increasing demands of data processing are also challenging the DCN because most of big data processing applications require extensive all-to-all server communications due to their distributed nature [14]. The impact of state-of-the-art DCN topologies on the performance and/or the energy efficiency of big data applications have been considered in [15]- [18]. To overcome the increasing power consumption and congestions in current data centers, and to meet the heterogeneous performance requirements of big data applications, an increasing number of hybrid and all-optical DCNs are proposed.…”
Section: !Introductionmentioning
confidence: 99%
“…In this paper, we compare the energy consumption and completion time for MapReduce sort workloads as in our previous work in [18] while additionally considering the server-centric PON-based DCN in [24]. The rest of this paper is organized as the following.…”
Section: !Introductionmentioning
confidence: 99%