27th International Conference on Distributed Computing Systems (ICDCS '07) 2007
DOI: 10.1109/icdcs.2007.107
|View full text |Cite
|
Sign up to set email alerts
|

Embedded Gossip: Lightweight Online Measurement for Large-Scale Applications

Abstract: For large-scale parallel applications, lightweight online monitoring can enable a wide range of online adaptations, including load balancing, power management, and progress monitoring. The processing and monitoring overhead of centralized global tracing techniques make them unsuitable for such tasks. Purely local tools, on the other hand, fail to provide the global information necessary for many desirable online adaptations of large-scale applications.In this paper, we describe a novel distributed online measu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2007
2007
2013
2013

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…The Distributed Performance Consultant in Paradyn [12] in conjunction with MRNet [13] provides a scalable framework for online automated performance diagnosis. Embedded Gossip [14] spreads performance data among processes including it transparently into the MPI messages sent by the application. Although these tools manage performance data online, only a few of them such as TAUg or TAUoverSupermon provide online access to this performance data as the Performance Introspection API does.…”
Section: Related Workmentioning
confidence: 99%
“…The Distributed Performance Consultant in Paradyn [12] in conjunction with MRNet [13] provides a scalable framework for online automated performance diagnosis. Embedded Gossip [14] spreads performance data among processes including it transparently into the MPI messages sent by the application. Although these tools manage performance data online, only a few of them such as TAUg or TAUoverSupermon provide online access to this performance data as the Performance Introspection API does.…”
Section: Related Workmentioning
confidence: 99%
“…Decentralized methods for consensus problem that are known to work well in a relatively static environment, for example, for parallel applications , are studied in . One approach to improving fault tolerance in dynamic distributed systems, which is measured by the maximum radius of impact caused by a given fault, is presented by S. Pike .…”
Section: Related Workmentioning
confidence: 99%
“…Decentralized methods for consensus problem that are known to work well in a relatively static environment, e.g., for parallel applications [25], are studied in [5], [19], [20]. One approach to improving fault tolerance in dynamic distributed systems, which is measured by the maximum radius of impact caused by a given fault, is presented by S. Pike [21].…”
Section: Related Workmentioning
confidence: 99%