Energy-Driven Straggler Mitigation in MapReduce

Phan, Tien-Dat; Ibrahim, Shadi; Zhou, Amelie Chi; Aupy, Guillaume; Antoniu, Gabriel

doi:10.1007/978-3-319-64203-1_28

Cited by 7 publications

(8 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In certain case, Precision is low and only 55% of those detected are actual stragglers and the Recall is also relatively low at 56%. For the same case, the hierarchical approach [14], i.e., a green-driven straggler detection mechanism, achieves a Precision of 99% and a Recall of 29%. This increase in precision can be translated to achieve lower execution time and energy consumption, and thus higher performance and energy efficiency; compared to the default Hadoop mechanism, execution time and energy consumption are reduced by almost 32% and 31%, respectively.…”

Section: Introductionmentioning

confidence: 90%

“…Throughout our experiments, we examined three straggler detection mechanisms, two from the literature: Default [7] and LATE [26] mechanisms. The third mechanism we consider is Hierarchical [14], which is a green straggler detection scheme that is applied hierarchically on the top of Default. Hereafter, we provide brief descriptions of the three mechanisms.…”

Section: Straggler Detection Mechanismsmentioning

confidence: 99%

“…Hierarchical. The main goal of the Hierarchical detector [14] is to reduce the energy consumption. Therefore, following the discussion in Section 4.2, we considered the following objectives: (i) to improve the Precision of an existing detection mechanism (by detecting less wrong stragglers), and (ii) to improve the Undetected Time by focusing on the stragglers on potentially very slow machines when the number of re-execution is limited.…”

Section: Ps =mentioning

confidence: 99%

See 2 more Smart Citations

A New Framework for Evaluating Straggler Detection Mechanisms in MapReduce

Phan

Pallez

Ibrahim

et al. 2019

ACM Trans. Model. Perform. Eval. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

Big Data systems (e.g., Google MapReduce, Apache Hadoop, Apache Spark) rely increasingly on speculative execution to mask slow tasks, also known as stragglers, because a job's execution time is dominated by the slowest task instance. Big Data systems typically identify stragglers and speculatively run copies of those tasks with the expectation that a copy may complete faster to shorten job execution times. There is a rich body of recent results on straggler mitigation in MapReduce. However, the majority of these do not consider the problem of accurately detecting stragglers. Instead, they adopt a particular straggler detection approach and then study its effectiveness in terms of performance, e.g., reduction in job completion time, or efficiency, e.g., high resource utilization. In this paper, we consider a complete framework for straggler detection and mitigation. We start with a set of metrics that can be used to characterize and detect stragglers including Precision, Recall, Detection Latency, Undetected Time and Fake Positive. We then develop an architectural model by which these metrics can be linked to measures of performance including execution time and system energy overheads. We further conduct a series of experiments to demonstrate which metrics and approaches are more effective in detecting stragglers and are also predictive of effectiveness in terms of performance and energy efficiencies. For example, our results indicate that the default Hadoop straggler detector could be made more effective. In certain case, Precision is low and only 55% of those detected are actual stragglers and the Recall, i.e., percent of actual detected stragglers, is also relatively low at 56%. For the same case, the hierarchical approach (i.e., a green-driven detector based on the default one) achieves a Precision of 99% and a Recall of 29%. This increase in Precision can be translated to achieve lower execution time and energy consumption, and thus higher performance and energy efficiency; compared to the default Hadoop mechanism, the energy consumption is reduced by almost 31%. These results demonstrate how our framework can offer useful insights and be applied in practical settings to characterize and design new straggler detection mechanisms for MapReduce systems. This work is supported by the ANR KerStream project (ANR-16-CE25-0014-01) and the Stack/Apollo connect talent project. The experiments presented in this paper were carried out using the Grid'5000/ALADDIN-G5K experimental testbed, an initiative from the French Ministry of Research through the ACI GRID incentive action, INRIA, CNRS and RENATER and other contributing partners (see http://www.grid5000.fr/ for details).

show abstract

Section: Introductionmentioning

confidence: 90%

Section: Straggler Detection Mechanismsmentioning

confidence: 99%

Section: Ps =mentioning

confidence: 99%

See 1 more Smart Citation

A New Framework for Evaluating Straggler Detection Mechanisms in MapReduce

Phan

Pallez

Ibrahim

et al. 2019

ACM Trans. Model. Perform. Eval. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…We find that the difference between the execution time of two copies in the same job can be more than 10x. Regarding the energy consumption, previous studies [14,15] have shown that there exists a trade-off between the performance and energy consumption when allocating speculative copies to nodes with different numbers of running tasks.…”

Section: Where To Launch? Heterogeneity Has To Be Consideredmentioning

confidence: 99%

“…In conclusion, it is important to consider the impact of heterogeneity on performance and energy consumption when making speculative copy allocation decisions. However, this might not be effective if it is done passively as shown in [15]. Hence, this motivates our window-based reservation technique.…”

Section: Where To Launch? Heterogeneity Has To Be Consideredmentioning

confidence: 99%

Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters

Zhou

Phan

Ibrahim

et al. 2018

Proceedings of the 47th International Conference on Parallel Processing

Self Cite

View full text Add to dashboard Cite

Many Big Data processing applications nowadays run on large-scale multi-tenant clusters. Due to hardware heterogeneity and resource contentions, straggler problem has become the norm rather than the exception in such clusters. To handle the straggler problem, speculative execution has emerged as one of the most widely used straggler mitigation techniques. Although a number of speculative execution mechanisms have been proposed, as we have observed from real-world traces, the questions of "when" and "where" to launch speculative copies have not been fully discussed and hence cause inefficiencies on the performance and energy of Big Data applications. In this paper, we propose a performance model and an energy consumption model to reveal the performance and energy variations with different speculative execution solutions. We further propose a window-based dynamic resource reservation and a heterogeneity-aware copy allocation technique to answer the "when" and "where" questions for speculative executions. Evaluations using real-world traces show that our proposed technique can improve the performance of Big Data applications by up to 30% and reduce the overall energy consumption by up to 34%.

show abstract