4th IEEE International Conference on Cloud Computing Technology and Science Proceedings 2012
DOI: 10.1109/cloudcom.2012.6427566
|View full text |Cite
|
Sign up to set email alerts
|

Online failure prediction in cloud datacenters by real-time message pattern learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(24 citation statements)
references
References 8 publications
0
22
0
Order By: Relevance
“…They filter and clean data to improve a prediction model's performance. There has also been a whole array of techniques applied to predicting failures in distributed system and computing clusters, where the works of Watanabe et al [2] [3], who show a method of pattern learning for the prediction of failures, Salfner et al [6][11], who model a system using Semihidden Markov Models and add fuzzy logic to the OFP scenario, and, specially, Zheng et al [5] [7] are the main ones. The latter authors have worked for several years with the IBM Blue Gene supercomputer and have a long streak of papers related to the issue at hand.…”
Section: Related Workmentioning
confidence: 99%
“…They filter and clean data to improve a prediction model's performance. There has also been a whole array of techniques applied to predicting failures in distributed system and computing clusters, where the works of Watanabe et al [2] [3], who show a method of pattern learning for the prediction of failures, Salfner et al [6][11], who model a system using Semihidden Markov Models and add fuzzy logic to the OFP scenario, and, specially, Zheng et al [5] [7] are the main ones. The latter authors have worked for several years with the IBM Blue Gene supercomputer and have a long streak of papers related to the issue at hand.…”
Section: Related Workmentioning
confidence: 99%
“…In 2006, Liang et al [6] propose a series of three ad-hoc created predictors for the Blue Gene supercomputer, based on the analysis of the failure characteristics found on its logs. Watanabe et al propose a method quite similar to association rules in [7]: the creation of an event dictionary with the associated probabilities for each entry to precede a failure. On the other hand, in [8], Sonoda, Watanabe and Matsumoto show how to identify patterns that lead to system failures through Bayesian learning, achieving a precision of over 0.8 and recall over 0.7.…”
Section: Online Failure Predictionmentioning
confidence: 99%
“…There are many different design principles such as 'Eliminating Single Point of Failure' [27,38], 'Disaster Recovery' [39,40] and 'Real-Time and Fast Failure Detection' [41][42][43] that can help achieve high availability and reliability in cloud computing environments. The single point of failure (SPOF) in cloud computing datacenters can occur in both software and hardware level.…”
Section: Research Backgroundmentioning
confidence: 99%
“…The time taken to detect a failure is one of the key factors in the cloud computing environments. So, fast and real-time failure detection to identify or predict a failure in the early stages is one of the most important principles to achieving high availability and reliability in cloud systems [42,43]. Moreover, there are some new trends in cloud computing such as SDN-based technology like Espresso that makes cloud infrastructures more reliable and available in the network level [44].…”
Section: Research Backgroundmentioning
confidence: 99%