2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2020
DOI: 10.1109/ipdps47924.2020.00115
|View full text |Cite
|
Sign up to set email alerts
|

Aarohi: Making Real-Time Node Failure Prediction Feasible

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 36 publications
0
6
0
Order By: Relevance
“…Desh [18] is a deep Learning based approach obtaining high accuracy in HPC nodes failure prediction. Das et al [16] proposed Aarohi, which is an extension to Desh with higher inference performance but it still suffers as inferior failure prediction capability and long training time, because it focuses only the inference stage. Moreover, Aarohi needs re-training and re-generation if any new failure patterns occur.…”
Section: Related Workmentioning
confidence: 99%
“…Desh [18] is a deep Learning based approach obtaining high accuracy in HPC nodes failure prediction. Das et al [16] proposed Aarohi, which is an extension to Desh with higher inference performance but it still suffers as inferior failure prediction capability and long training time, because it focuses only the inference stage. Moreover, Aarohi needs re-training and re-generation if any new failure patterns occur.…”
Section: Related Workmentioning
confidence: 99%
“…This plan makes areas of strength for a standard with an unsettling influence dismissal term for every specialist, trailed by a further developed MAS control framework with no aggravation term. Making a time span for Node Failure Prediction is accessible with Aarohi [6]. Aarohi is a structure that offers a successful method for estimating disappointments on the web.…”
Section: Related Workmentioning
confidence: 99%
“…(2) Then it re-trains chain recognition of events augmented with expected lead times to failure; (3) Finally Desh predicts lead times during testing/inference deployment to predict which specific node fails in how many minutes. To speedup the recognition of the failure chain from log events, Das et al [34] proposed a new node failure predicting method called Aarohi to extend their previous work to online learning setting. Aarohi first trains an offline deep learning model with log parsing, then utilizes grammar-based rules to provide online testing.…”
Section: Homogeneous Systemsmentioning
confidence: 99%