2005
DOI: 10.1016/s0927-5452(05)80008-5
|View full text |Cite
|
Sign up to set email alerts
|

Data placement in widely distributed environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(12 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…Parallelism sends different chunks of the same file over parallel data streams (typically TCP connections), and can achieve high throughput by aggregating multiple streams and utilizing a larger share of the available network bandwidth [9], [11], [21], [22]. Concurrency refers to sending multiple files simultaneously through the network using different data channels, and is especially useful for increasing I/O concurrency in parallel disk systems [23], [24], [25]. Figure 1 shows the impact of protocol parameters concurrency and pipelining on transfer throughput.…”
Section: Motivationmentioning
confidence: 99%
“…Parallelism sends different chunks of the same file over parallel data streams (typically TCP connections), and can achieve high throughput by aggregating multiple streams and utilizing a larger share of the available network bandwidth [9], [11], [21], [22]. Concurrency refers to sending multiple files simultaneously through the network using different data channels, and is especially useful for increasing I/O concurrency in parallel disk systems [23], [24], [25]. Figure 1 shows the impact of protocol parameters concurrency and pipelining on transfer throughput.…”
Section: Motivationmentioning
confidence: 99%
“…This is even more crucial when a big data workflow is executed in a distributed environment that involves multiple and heterogeneous data centers. Kosar et al, (2005aKosar et al, ( , 2005b proposed an allocation framework for distributed computing systems which considered the data placement subsystem as an independent module along with the computation subsystem. In their proposed model, both data placement and task computation jobs can be queued, scheduled, monitored, managed and even check pointed.…”
Section: Related Workmentioning
confidence: 99%
“…The Stork data scheduler [24,37,38] implements techniques specific to queuing, scheduling and optimization of data placement jobs, provides high reliability in data transfers and creates a level of abstraction between the user applications and the underlying data transfer and storage resources (including file transfer protocol (FTP), hypertext transfer protocol (HTTP), GridFTP, SRM, SRB and iRODS) via a modular, uniform interface. Stork is considered one of the very first examples of 'data-aware scheduling' and has been very actively used in many e-Science application areas, including: coastal hazard prediction, reservoir uncertainty analysis, digital sky imaging, educational video processing, numerical relativity and multi-scale computational fluid dynamics resulting in breakthrough research [5][6][7][8][9][10][38][39][40].…”
Section: Stork Data Schedulermentioning
confidence: 99%