Esma Yildirim scite author profile

Balman

2008

Using multiple parallel streams for wide area data transfers may yield much better performance than using a single stream, but overwhelming the network by opening too many streams may have an inverse effect. The congestion created by excess number of streams may cause a drop down in the throughput achieved. Hence, it is important to decide on the optimal number of streams without congesting the network. Predicting this 'magic' number is not straightforward, since it depends on many parameters specific to each individual transfer. Generic models that try to predict this number either rely too much on historical information or fail to achieve accurate predictions. In this paper, we present a set of new models which aim to approximate the optimal number with least history information and lowest prediction overhead. We measure the feasibility and accuracy of these models by comparing to actual GridFTP data transfers. We also discuss how these models can be used by a data scheduler to increase the overall performance of the incoming transfer requests.

show abstract

Prediction of Optimal Parallelism Level in Wide Area Data Transfers

Yin

IEEE Trans. Parallel Distrib. Syst.

2011

Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency

IEEE Trans. Cloud Comput.

Arslan

Kim

et al. 2016

In end-to-end data transfers, there are several factors affecting the data transfer throughput, such as the network characteristics (e.g. network bandwidth, round-trip-time, background traffic); end-system characteristics (e.g. NIC capacity, number of CPU cores and their clock rate, number of disk drives and their I/O rate); and the dataset characteristics (e.g. average file size, dataset size, file size distribution). Optimization of big data transfers over inter-cloud and intra-cloud networks is a challenging task that requires joint-consideration of all of these parameters. This optimization task becomes even more challenging when transferring datasets comprised of heterogeneous file sizes (i.e. large files and small files mixed). Previous work in this area only focuses on the end-system and network characteristics however does not provide models regarding the dataset characteristics. In this study, we analyze the effects of the three most important transfer parameters that are used to enhance data transfer throughput: pipelining, parallelism and concurrency. We provide models and guidelines to set the best values for these parameters and present two different transfer optimization algorithms that use the models developed.The tests conducted over high-speed networking and cloud testbeds show that our algorithms outperform the most popular data transfer tools like Globus Online and UDT in majority of the cases.

show abstract

End-to-End Data-Flow Parallelism for Throughput Optimization in High-Speed Networks

2012

J Grid Computing

A data throughput prediction and optimization service for widely distributed many-task computing

Yin

2009