Dengpan Yin scite author profile

In this paper, we present the design and implementation of a network throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is used to decide the number of parallel streams to achieve best performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork data scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than the unoptimized transfer time in most cases.

show abstract

Balancing TCP buffer vs parallel streams in application level throughput optimization

Yildirim

Yin

Kosar

2009

View full text Add to dashboard Cite

The end-to-end performance of TCP over wide-area may be a major bottleneck for large-scale network-based applications. Two practical ways of increasing the TCP performance at the application layer is using multiple parallel streams and tuning the buffer size. Tuning the buffer size can lead to significant increase in the throughput of the application. However using multiple parallel streams generally gives better results than optimized buffer size with a single stream. Parallel streams tend to recover from failures quicker and are more likely to steal bandwidth from the other streams sharing the network. Moreover our experiments show that proper usage of tuned buffer size with parallel streams can even increase the throughput more than the cases where only tuned buffers and only parallel streams are used. In that sense, balancing a tuned buffer size and the number of parallel streams and defining the optimal values for those parameters are very important. In this paper, we analyze the results of different techniques to balance TCP buffer and parallel streams at the same time and present the initial steps to a balanced modeling of throughput based on these optimized parameters.

show abstract

A data-aware workflow scheduling algorithm for heterogeneous distributed systems

Yin

Kosar

2011

View full text Add to dashboard Cite

A genetic algorithm for data-aware approximate workflow scheduling

Kosar

Yin

2013

View full text Add to dashboard Cite

Data placement in complex scientific workflows gradually attracts more attention since the large amounts of data generated by these workflows significantly increases the turnaround time of the end-to-end application. It is almost impossible to make an optimal scheduling for the end-to-end workflow without considering the intermediate data movement. In order to reduce the complexity of the workflow-scheduling problem, most of the existing work constrains the problem space by some unrealistic assumptions, which result in non-optimal scheduling in practice. In this study, we propose a genetic data-aware algorithm for the end-to-end workflow scheduling problem, which performs very close to the optimal solution.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dengpan Yin

Prediction of Optimal Parallelism Level in Wide Area Data Transfers

A data throughput prediction and optimization service for widely distributed many-task computing

Balancing TCP buffer vs parallel streams in application level throughput optimization

A data-aware workflow scheduling algorithm for heterogeneous distributed systems

A genetic algorithm for data-aware approximate workflow scheduling

Contact Info

Product

Resources

About