2008 Symposium on Reliable Distributed Systems 2008
DOI: 10.1109/srds.2008.35
|View full text |Cite
|
Sign up to set email alerts
|

Gumshoe: Diagnosing Performance Problems in Replicated File-Systems

Abstract: Replicated file-systems can experience degraded performance that might not be adequately handled by the underlying fault-tolerant protocols. We describe the design and implementation of Gumshoe, a system that aims to diagnose performance problems in replicated file-systems. Gumshoe periodically gathers OS and protocol metrics and then analyzes these metrics to automatically localize the performance problem to the culprit node(s). We describe our results and experiences with problem diagnosis in two replicated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2010
2010
2014
2014

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…The rationale of their approach is based on the observation that there is an obvious difference between the behaviors of fault-free and faulty nodes. Kavulya et al [9] and Lan et al [8] proposed the similar approaches to detect performance problems in replicated file systems and a cluster system, respectively. However, the validation of these approaches is based on a strong assumption of homogeneous hardware and workloads, which may only hold in a few cases.…”
Section: Related Workmentioning
confidence: 99%
“…The rationale of their approach is based on the observation that there is an obvious difference between the behaviors of fault-free and faulty nodes. Kavulya et al [9] and Lan et al [8] proposed the similar approaches to detect performance problems in replicated file systems and a cluster system, respectively. However, the validation of these approaches is based on a strong assumption of homogeneous hardware and workloads, which may only hold in a few cases.…”
Section: Related Workmentioning
confidence: 99%
“…The rationale of their approach is based on the observation that there is an obvious difference between the behaviors of fault-free and faulty nodes. Kavulya et al [10] and Lan et al [9] proposed the similar approaches to detect performance problems in replicated file systems and a cluster system, respectively. However, the validation of these approaches is based on a strong assumption of homogeneous hardware and workloads, which may only hold in a few cases.…”
Section: Related Workmentioning
confidence: 99%
“…Textual console log analysis in high performance computing [6], [7] is more flexible, but maintaining such logs may be impractical in high-volume systems where transactions are very short, time-sensitive, and rapid; textual logs would be immense -difficult to output, store and retrieve. Finally, some unsupervised approaches [8], [9] rely on domain insights and system knowledge, and therefore have limited applicability.…”
Section: Introductionmentioning
confidence: 99%
“…Recent approaches to the monitoring problem [3], [9], [10] focus on early detection and handling of performance problems, or latent faults. These are outliers -machine behaviors that could indicate a fault yet fly under the radar of monitoring systems because they are not acute enough, or were not anticipated by maintenance engineers.…”
Section: Introductionmentioning
confidence: 99%