2015 IEEE International Parallel and Distributed Processing Symposium 2015
DOI: 10.1109/ipdps.2015.92
|View full text |Cite
|
Sign up to set email alerts
|

Identifying the Culprits Behind Network Congestion

Abstract: Abstract-Network congestion is one of the primary causes of performance degradation, performance variability and poor scaling in communication-heavy parallel applications. However, the causes and mechanisms of network congestion on modern interconnection networks are not well understood. We need new approaches to analyze, model and predict this critical behavior in order to improve the performance of large-scale parallel applications. This paper applies supervised learning algorithms, such as forests of extrem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(22 citation statements)
references
References 17 publications
0
19
0
3
Order By: Relevance
“…Of particular relevance to our work, in [4], Bhatele et. al used machine learning to identify sources of network congestion and their success inspired us to employ machine learning techniques in this work, though we are exploring different phenomena.…”
Section: Random Forestsmentioning
confidence: 73%
See 2 more Smart Citations
“…Of particular relevance to our work, in [4], Bhatele et. al used machine learning to identify sources of network congestion and their success inspired us to employ machine learning techniques in this work, though we are exploring different phenomena.…”
Section: Random Forestsmentioning
confidence: 73%
“…We used 100 estimators 4 (trees in the forest), with separate runs for each feature set. That is, we ran the regression for each feature set in isolation.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is mostly getting more common to adopt corresponding tools (scikit-learn, caffe, TensorFlow, and so on) from machine learning (ML) for intelligence analysis and pattern invention. This has enabled researchers to build predictive models, conclude complex correlation patterns, determine fields of interest for performance optimization, and detect irregular patterns (Bhowmick, S., Eijkhout, V., Freund, Y., Fuentes, E., & Keyes, D., 2006;Sukhija, N., Malone, B., Srivastava, S., Banicescu, I., & Ciorba, F. M., 2014;Bhatele, A., Titus, A. R., et al, 2015, May;Yeom, J. S., Thiagarajan, J. J., et al, 2016;Islam, T. Z., Thiagarajan, J. J., et al, 2016, November).…”
Section: Introductionmentioning
confidence: 99%
“…Public cloud providers such as Amazon's EC2 and Microsoft's Azure provide a large number of cloud instance types with different numbers of cores, processing speeds, and network interconnections [18]. Research in this area focuses mostly on porting applications to the cloud [12], evaluating their performance and cost efficiency [13,18], and improving communication performance [4,3,2].…”
Section: Introductionmentioning
confidence: 99%