2015
DOI: 10.1109/tpds.2014.2311814
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Algorithms for Diagnosing Large-Scale Failures in Computer Networks

Abstract: Abstract-In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 13 publications
0
8
0
Order By: Relevance
“…7, we plot FP CRGT and FN CRGT , with OPT2 th ¼15 for random network, and with OPT2 th ¼7 for network with Fig.7.False positives and false negatives due to this correlation netCSI overestimates the number of failed objects. Using this insight, we propose an adaptive algorithm that works for both independent and large-scale failures in our subsequent work [25].…”
Section: Effect Of Varying Number Of Reporting Nodesmentioning
confidence: 99%
“…7, we plot FP CRGT and FN CRGT , with OPT2 th ¼15 for random network, and with OPT2 th ¼7 for network with Fig.7.False positives and false negatives due to this correlation netCSI overestimates the number of failed objects. Using this insight, we propose an adaptive algorithm that works for both independent and large-scale failures in our subsequent work [25].…”
Section: Effect Of Varying Number Of Reporting Nodesmentioning
confidence: 99%
“…Tati et al [27] develop algorithms to diagnose large-scale, clustered failures from incomplete symptoms and to more accurately diagnose both independent and clustered failures. Lim et al [21] mine logs of large enterprise telephony systems.…”
Section: Related Workmentioning
confidence: 99%
“…Several basic ideas are based on time window [13], frequent pattern [8] and call graph [9]. Some work combined these ideas together and point out that the system may change so often and provide a probability-based results [10].…”
Section: Related Workmentioning
confidence: 99%