2005
DOI: 10.1145/1095809.1095821
|View full text |Cite
|
Sign up to set email alerts
|

Capturing, indexing, clustering, and retrieving system history

Abstract: In operating today's complex systems, the lack of a systematic way to capture and query the essential system state characterizing an incident of performance failure or unavailability makes it difficult for operators to distinguish recurring problems from new ones, to leverage previous diagnostic efforts, or to establish whether problems seen at different installations of the same site are similar or distinct. We present a method for automatically extracting from a running system an indexable signature that dis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
59
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 94 publications
(59 citation statements)
references
References 12 publications
0
59
0
Order By: Relevance
“…Many research ideas in production system monitoring may be applicable for load testing analysis. For example, approaches (e.g., [20], [193], [194], [195]) have been proposed to build performance signatures based on the past failures, so that whenever such symptoms occur in the field, the problems can be detected and notified right away. Analogously, we can formulate our performance signature based on mining the past load testing history and use these performance signatures to detect recurrent problems in load tests.…”
Section: Summary and Open Problemsmentioning
confidence: 99%
“…Many research ideas in production system monitoring may be applicable for load testing analysis. For example, approaches (e.g., [20], [193], [194], [195]) have been proposed to build performance signatures based on the past failures, so that whenever such symptoms occur in the field, the problems can be detected and notified right away. Analogously, we can formulate our performance signature based on mining the past load testing history and use these performance signatures to detect recurrent problems in load tests.…”
Section: Summary and Open Problemsmentioning
confidence: 99%
“…Xu et al [5] attempt to identify problems with production logs of distributed systems, and suggest methodologies to enhance the performance of mining the logs by automatic matching of log statements. Cohen et al [12] describe how failure prediction models are built to identify and study the root-causes of failures. They propose techniques to categorize the faulty execution results found in the logs, before building failure prediction models based on them.…”
Section: B Online Service Failure Predictionmentioning
confidence: 99%
“…We do not implement these predictors, we use commercially available software that implement them. We choose these algorithms because of their wide use [9], [12], [5], [14] and because they represent different approaches to data mining and machine learning; they represent Artificial Intelligence, Clustering Analysis, Statistical Methods, and Decision Rules respectively [2], [3], [22].…”
Section: Performance Metricsmentioning
confidence: 99%
“…All of these projects compute dependencies, and therefore cannot deal well with missing dependency infor mation or resource contention. Much of this dependency modeling work requires that the system be actively perturbed by instrumentation or by probing [5,6,9,10,19]. Unfortunately, for many important systems, no such modifications are possible (for reasons of performance, administration, or cost).…”
Section: Related Workmentioning
confidence: 99%