Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing 2010
DOI: 10.1145/1851476.1851582
|View full text |Cite
|
Sign up to set email alerts
|

A data transfer framework for large-scale science experiments

Abstract: Modern scientific experiments can generate hundreds of gigabytes to terabytes or even petabytes of data that may furthermore be maintained in large numbers of relatively small files. Frequently, this data must be disseminated to remote collaborators or computational centers for data analysis. Moving this data with high performance and strong robustness and providing a simple interface for users are challenging tasks. We present a data transfer framework comprising a high-performance data transfer library based… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(29 citation statements)
references
References 10 publications
0
29
0
Order By: Relevance
“…However, Globus Online only performs transfers between GridFTP instances, remains unaware of the environment and therefore its transfer optimizations are mostly done statically. Several extensions brought to GridFTP allow users to enhance transfer performance by tuning some key parameters: threading in [11] or overlays in [12]. Still, these works only focus on optimizing some specific constraints and ignore others (e.g.…”
Section: The Geographically Distributed Data Management Ecosystemmentioning
confidence: 99%
See 1 more Smart Citation
“…However, Globus Online only performs transfers between GridFTP instances, remains unaware of the environment and therefore its transfer optimizations are mostly done statically. Several extensions brought to GridFTP allow users to enhance transfer performance by tuning some key parameters: threading in [11] or overlays in [12]. Still, these works only focus on optimizing some specific constraints and ignore others (e.g.…”
Section: The Geographically Distributed Data Management Ecosystemmentioning
confidence: 99%
“…On the other hand, endsystem parallelism can be exploited to improve utilization of a single path. This can be achieved by means of parallel streams [14] or concurrent transfers [15]. Although using parallelism may improve throughput in certain cases, one should also consider system configuration since specific local constraints (e.g., low disk I/O speeds or over-tasked CPUs) may introduce bottlenecks.…”
Section: The Geographically Distributed Data Management Ecosystemmentioning
confidence: 99%
“…However, Globus Online only performs file transfers between GridFTP instances, remains unaware of the environment and therefore its transfer optimizations are mostly done statically. Several extensions brought to GridFTP allow users to enhance transfer performance by tuning some key parameters: threading in [32] or overlays in [29]. Still, these works only focus on optimizing some specific constraints and ignore others (e.g., TCP buffer size, number of outbound requests).…”
Section: Related Workmentioning
confidence: 99%
“…Another transfer parameter used for throughput optimization was pipelining, which helped in improving the performance of transferring large number of small files [11,18,19]. Liu et al [32] optimized network throughput by concurrently opening multiple transfer sessions and transferring multiple files concurrently. They proposed increasing the number of concurrent data transfer channels until the network performance degrades.…”
Section: Related Workmentioning
confidence: 99%