2008
DOI: 10.1007/978-3-540-87353-2_13
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Monitoring with Dynamic Differential Tracing-Based Diagnosis

Abstract: Abstract. Ensuring high availability, adequate performance, and proper operation of enterprise software systems requires continuous monitoring. Today, most systems operate with minimal monitoring, typically based on service-level objectives (SLOs). Detailed metric-based monitoring is often too costly to use in production, while tracing is prohibitively expensive. Configuring monitoring when problems occur is a manual process.In this paper we propose an alternative: Minimal monitoring with SLOs is used to detec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 15 publications
0
6
0
Order By: Relevance
“…In this work we focus on failures that have a clear manifestation in log files rather than failures that do not generate events in log files. In particular, failures that are performance-related can be detected and diagnosed by other means as we have shown in our prior work [21]. We describe in Section 3.6, how to identify fault classes that do not have a clear manifestation in log files and merge those fault classes to improve the classification performance.…”
Section: Extracting Relevant Datamentioning
confidence: 96%
See 1 more Smart Citation
“…In this work we focus on failures that have a clear manifestation in log files rather than failures that do not generate events in log files. In particular, failures that are performance-related can be detected and diagnosed by other means as we have shown in our prior work [21]. We describe in Section 3.6, how to identify fault classes that do not have a clear manifestation in log files and merge those fault classes to improve the classification performance.…”
Section: Extracting Relevant Datamentioning
confidence: 96%
“…Once a failure is validated (e.g., as described in previous work [12,13,21]), we seek to identify recurrent faults that underlie the observed failure. If the fault that triggered a failure has been seen before, we may be able to fast-track resolution by retrieving information about past actions to restore the failed component.…”
Section: Learning Fault Manifestationsmentioning
confidence: 99%
“…Collecting additional metrics or joining managed resources is addressed in [2,3,5] either to adapt monitoring to meet SLA modifications, or to deal with the managed scope changes, or even to operate a "minimal" monitoring that is able to be extended in case of SLA violations. Indeed, the capability of scaling up/down the monitored metrics and resources is important as an adaptation action.…”
Section: Related Workmentioning
confidence: 99%
“…This section enumerates some existing trends focusing on (i) adapting the QoS monitoring in autonomic systems [2,3,4,5,6], and (ii) designing patterns regarding the distributed deployment as well as the adaptation of MAPE loop modules [7,8,9].…”
Section: Related Workmentioning
confidence: 99%
“…The diagnostic approaches differ by monitoring data and overhead. To localize abnormal components from application traces we refer the reader to [16], for identifying recurrent faults from log-files to [13], [14], and finally for localizing faulty components the reader is referred to [15], [17]- [19].…”
Section: Incorporating Symptoms Of Known Faultsmentioning
confidence: 99%