2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC) 2015
DOI: 10.1109/pccc.2015.7410316
|View full text |Cite
|
Sign up to set email alerts
|

ATLAS: An AdapTive faiLure-Aware Scheduler for Hadoop

Abstract: Hadoop has become the de facto standard for processing large data in today's cloud environment. The performance of Hadoop in the cloud has a direct impact on many important applications ranging from web analytic, web indexing, image and document processing to high-performance scientific computing. However, because of the scale, complexity and dynamic nature of the cloud, failures are common and these failures often impact the performance of jobs running in Hadoop. Although Hadoop possesses built-in failure det… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 25 publications
0
4
0
Order By: Relevance
“…In another study, Memishi et al [ 9 ] also proposed an approach that estimates the completion time of the workload and calculates the progress rate of each task to adjust the timeout value dynamically. Other studies by [ 17 , 18 , 19 , 20 ] provided predictive models based on machine learning and AI algorithms to estimate and set an optimal heartbeat timeout on the fly or to predict the failures before they occur. These approaches reduce the task fault occurrences and improve their overall performance with low latency in fault detection.…”
Section: Related Workmentioning
confidence: 99%
“…In another study, Memishi et al [ 9 ] also proposed an approach that estimates the completion time of the workload and calculates the progress rate of each task to adjust the timeout value dynamically. Other studies by [ 17 , 18 , 19 , 20 ] provided predictive models based on machine learning and AI algorithms to estimate and set an optimal heartbeat timeout on the fly or to predict the failures before they occur. These approaches reduce the task fault occurrences and improve their overall performance with low latency in fault detection.…”
Section: Related Workmentioning
confidence: 99%
“…In [28], the authors formally define the scheduling outcome of an executed job composed of X map tasks and Y reduce tasks. S(M apAtt ip ) is the status of a map i after the p th attempt, and S(ReduceAtt jq ) the status of reduce j after the q th attempt.…”
Section: Problem Formulation Examplementioning
confidence: 99%
“…K and L represent the maximum numbers of scheduling attempts allowed for map and reduce tasks respectively. The authors [28] model the scheduling outcome of a scheduled job as follows:…”
Section: Problem Formulation Examplementioning
confidence: 99%
See 1 more Smart Citation