2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium 2015
DOI: 10.1109/hpcc-css-icess.2015.170
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Scheduling Failures in the Cloud: A Case Study with Google Clusters and Hadoop on Amazon EMR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 11 publications
0
16
0
Order By: Relevance
“…A failure prediction model has been proposed in our previous work [42], and we improved the model's accuracy by using multiple feature selection techniques. Our failure prediction model outperformed others in previous studies [8,33]. Table 1 summarizes the most current studies on the analysis and prediction of job and task failures.…”
Section: Failure Predictionmentioning
confidence: 73%
See 1 more Smart Citation
“…A failure prediction model has been proposed in our previous work [42], and we improved the model's accuracy by using multiple feature selection techniques. Our failure prediction model outperformed others in previous studies [8,33]. Table 1 summarizes the most current studies on the analysis and prediction of job and task failures.…”
Section: Failure Predictionmentioning
confidence: 73%
“…Earlier research on job failure has mostly focused on the study and characterization of failures. However, little research has been published on the prediction of job/task failure [1,8,10,[31][32][33][34]. Samak et al [35] have applied the Naive Bayes classification algorithm to the execution logs of scientific processes to predict the failure of tasks.…”
Section: Failure Predictionmentioning
confidence: 99%
“…RNN achieved an accuracy of around 84%. Soualhia et al [13] explored possibility of predicting application task failure in cloud platform to enhance performance of resources that are used to execute tasks. The authors applied a set of statistical and machine learning models such as Decision Tree (DT), Boost, and Random Forest (RF) to predict task failure.…”
Section: A Task Failure Predictionmentioning
confidence: 99%
“…The public Google Traces provides data about task and job failures in real world Google clusters. We found that more than 40% of the tasks and jobs can be failed [7]. Therefore, in our case study, we performed different simulations of varying the injected failure rates, with a maximum failure rate of 40%.…”
Section: Setup Of the Case Studymentioning
confidence: 99%
“…In addition, we used AnarchyApe [18] to inject different types of failures in Hadoop machines. We relied on rate of failures observed in Google clusters [7]. It is possible that the majority of Hadoop clusters do not experience such high rate of failures.…”
Section: Threats To Validitymentioning
confidence: 99%