2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) 2010
DOI: 10.1109/hpcc.2010.32
|View full text |Cite
|
Sign up to set email alerts
|

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments

Abstract: We present a monitoring system for large-scale parallel and distributed computing environments that allows to trade-off accuracy in a tunable fashion to gain scalability without compromising fidelity. The approach relies on classifying each gathered monitoring metric based on individual needs and on aggregating messages containing classes of individual monitoring metrics using a tree-based overlay network. The MRNet-based prototype is able to significantly reduce the amount of gathered and stored monitoring da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
12
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 11 publications
0
12
0
Order By: Relevance
“…It has been used for scalable parallel performance-monitoring and profiling of high-performance computing applications [8,20]. MRNet can pass filtered information up and down the tree, which can be processed at each tree node.…”
Section: Average Local Window Size and Vector Agementioning
confidence: 99%
See 4 more Smart Citations
“…It has been used for scalable parallel performance-monitoring and profiling of high-performance computing applications [8,20]. MRNet can pass filtered information up and down the tree, which can be processed at each tree node.…”
Section: Average Local Window Size and Vector Agementioning
confidence: 99%
“…This size is slightly larger than the 64 bytes used by Bohm et al [8] for monitoring and the 84 bytes used in MOSIX [9], a decentralized cluster management system that uses process migration for load balancing. Recall that the global information must also be included.…”
Section: Gossip Parameters Using a Single Mastermentioning
confidence: 99%
See 3 more Smart Citations