2015 IEEE 35th International Conference on Distributed Computing Systems 2015
DOI: 10.1109/icdcs.2015.43
|View full text |Cite
|
Sign up to set email alerts
|

FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing

Abstract: Abstract-Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOWPROPHET, a general framework to predict traffic flows for DCFs.To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains nece… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 18 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…Note that, physical hosts in a data center network can be heterogeneous and available bandwidth on each path can be different because some of the bandwidth can be occupied by other applications [14]. Flow size and flow length can be extracted from the log and metadata files as coflow information [42], [43].…”
Section: System Modelmentioning
confidence: 99%
“…Note that, physical hosts in a data center network can be heterogeneous and available bandwidth on each path can be different because some of the bandwidth can be occupied by other applications [14]. Flow size and flow length can be extracted from the log and metadata files as coflow information [42], [43].…”
Section: System Modelmentioning
confidence: 99%
“…c-Through [21] increases the socket buffer and uses large buffer occupancy to indicate the optical link demand. Moreover, researchers have started to forecast traffic demands of scientific and data-intensive parallel applications from diverse layers (e.g., application layer [22], compiler layer [7,14]). And for clusters that are orchestrated by centralized schedulers (e.g., MPICH2 Hydra, Hadoop YARN), the schedulers orchestrate jobs to compute, storage nodes, and make traffic demand visible.…”
Section: Traffic Demand Estimationmentioning
confidence: 99%