A data throughput prediction and optimization service for widely distributed many-task computing

Yin, Dengpan; Yildirim, Esma; Kosar, Tevfik

doi:10.1145/1646468.1646472

Cited by 18 publications

(18 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…the performance degradation due to the overhead of opening too many parallel streams). In our previous work, we have developed two highly-accurate models based on Full Second-order [17] and Partial C-order [16]. These models would require as few as three sampling points in the best case to provide very accurate predictions, but in the worst case they could require up to six or seven sampling points for accurate results.…”

Section: Related Workmentioning

confidence: 99%

“…These models lay the foundations of our current work for a highly-accurate and low-overhead prediction model for transfer throughput optimization. Partial C-order_1_2_4 Partial second-order_1_2 The existing prediction models (Partial Second-order [23], Full Second-order [17] and Partial C-order [16]) worked with as few as two or three sampling points, but choosing the best two or three sampling points was a major challenge in those models. If we randomly choose these two or three data points, the resulting approximation may be highly inaccurate (see Figure 1), since there is a high possibility that these random points may not reflect the characteristics of the actual throughput curve.…”

Section: Related Workmentioning

confidence: 99%

“…Parallel streams achieve high throughput by mimicking the behavior of individual streams and get a higher share of the available bandwidth [7][8][9][10][11][12][13][14][15]. In our previous study, we presented two new theoretical models (Partial C-order and Full Second-order) that are used to predict the behavior of parallel streams and discussed their assumptions in application [16,17]. We also applied those models and measured their accuracy against actual GridFTP [18] transfers with the improvements we have made [19].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Highly-Accurate and Low-Overhead Prediction Model for Transfer Throughput Optimization

Kim

Yildirim

Kosar

2012

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Self Cite

View full text Add to dashboard Cite

An important bottleneck for data-intensive scalable computing systems is efficient utilization of the network links that connect the collaborating institutions with their remote partners, data sources, and computational sites. To alleviate this bottleneck, we propose an application-layer throughput optimization model based on parallel stream number prediction. This new model extends our two previous models (Partial C-order and Full Second-order) to achieve higher accuracy and lower overhead predictions. Our new model, called Full C-order, outperforms both of our previous models as well as the most relevant model by others, the Partial Second-order, in terms of both accuracy and efficiency. We test and compare these four models on emulated testbeds and on production environments using a wide variety of data set sizes, RTT, and bandwidth combinations. Our comprehensive experiments confirm the superiority of our new model to the other three models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Highly-Accurate and Low-Overhead Prediction Model for Transfer Throughput Optimization

Kim

Yildirim

Kosar

2012

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Self Cite

View full text Add to dashboard Cite

show abstract

“…The systems probe and measurements with external profilers are needed. Complex models are used to calculate the optimum number of multiple streams with the help of sample measurements in order to make a prediction [23,25,26]. Further, network conditions may change over time in the shared environments, and the estimated value might not reflect the most recent state of the system.…”

Section: Application-level Dynamic Tuningmentioning

confidence: 99%

“…The proposed methodology operates without depending on any historical measurements and does not use external profiles for measurement. Instead of using predictive sampling as proposed in [17,25,26], we make use of the instant throughput information gathered from the actual data transfer operations that are currently active. The number of multiple streams is set dynamically in an adaptive manner by gradually increasing the number of concurrent connections up to an optimal point.…”

Section: Application-level Dynamic Tuningmentioning

confidence: 99%

Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

Sim

Balman

Williams

et al. 2010

View full text Add to dashboard Cite

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

show abstract

Dynamic Protocol Tuning Algorithms for High Performance Data Transfers

Arslan

Ross

Kosar

2013

Euro-Par 2013 Parallel Processing

Self Cite

View full text Add to dashboard Cite

Obtaining optimal data transfer performance is of utmost importance to today's data-intensive distributed applications and wide-area data replication services.Doing so necessitates effectively utilizing available network bandwidth and resources, yet in practice transfers seldom reach the levels of utilization they potentially could. Tuning protocol parameters such as pipelining, parallelism, and concurrency can significantly increase utilization and performance, however determining the best settings for these parameters is a difficult problem, as network conditions can vary greatly between sites and over time. Nevertheless, it is an important problem, since poor tuning can cause either under-or over-utilization of network resources and thus degrade transfer performance. In this paper, we present three algorithms for application-level tuning of different protocol parameters for maximizing transfer throughput in wide-area networks.Our algorithms dynamically tune the number of parallel data streams per file (for large file optimization), the level of control channel pipelining (for small file optimization), and the number of concurrent file transfers to increase I/O throughput (a technique useful for all types of files). The proposed heuristic algorithms improve the transfer throughput up to 10x compared to the baseline and 7x compared to the state of the art solutions.

show abstract

A data throughput prediction and optimization service for widely distributed many-task computing

Cited by 18 publications

References 20 publications

A Highly-Accurate and Low-Overhead Prediction Model for Transfer Throughput Optimization

A Highly-Accurate and Low-Overhead Prediction Model for Transfer Throughput Optimization

Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

Dynamic Protocol Tuning Algorithms for High Performance Data Transfers

Contact Info

Product

Resources

About