2019
DOI: 10.1109/tsc.2016.2625247
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Workflow-Level Data Placement Strategy for Data-Sharing Scientific Cloud Workflows

Abstract: Cloud computing can provide a more cost-effective way to deploy scientific workflows than traditional distributed computing environments such as cluster and grid. Due to the large size of scientific datasets, data placement plays an important role in scientific cloud workflow systems for improving system performance and reducing data transfer cost. Traditional tasklevel data placement strategy only considers shared datasets within individual workflows to reduce data transfer cost. However, it is obvious that t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(21 citation statements)
references
References 36 publications
0
21
0
Order By: Relevance
“…The data placement problem has been studied extensively in the literature spanning a wide-variety of research areas, both from the perspective of execution environments: ranging from distributed systems (Chervenak et al, 2007;Golab et al, 2014) to cloud computing environments (Yuan et al, 2010;Li et al, 2017;Ebrahimi et al, 2015;Liu and Datta, 2011); application areas: online social network services (Jiao et al, 2014;Han et al, 2017), location aware data placement for geo-distributed cloud services (Yu and Pan, 2017;Zhang et al, 2016;Yu and Pan, 2015;Yu and Pan, 2016;Agarwal et al, 2010), and many more. Here, we provide an overview the existing works that overlap with our problem.…”
Section: Related Workmentioning
confidence: 99%
“…The data placement problem has been studied extensively in the literature spanning a wide-variety of research areas, both from the perspective of execution environments: ranging from distributed systems (Chervenak et al, 2007;Golab et al, 2014) to cloud computing environments (Yuan et al, 2010;Li et al, 2017;Ebrahimi et al, 2015;Liu and Datta, 2011); application areas: online social network services (Jiao et al, 2014;Han et al, 2017), location aware data placement for geo-distributed cloud services (Yu and Pan, 2017;Zhang et al, 2016;Yu and Pan, 2015;Yu and Pan, 2016;Agarwal et al, 2010), and many more. Here, we provide an overview the existing works that overlap with our problem.…”
Section: Related Workmentioning
confidence: 99%
“…Although advancements in enabling technologies such as big data and cloud computing have provided us with the necessary machinery and systems (e.g., Apache Hadoop [39] and Spark [46]) to perform data management at scale, effective strategies for data placement and partitioning remain crucial for ensuring the performance of such systems [15]. Having said that, the field of data placement has witnessed a humongous amount of research over the past two decades [3], [4], [17], [30], [36], [41], [43], [45], [48], [49].…”
Section: Motivationmentioning
confidence: 99%
“…In the past decade, the data placement problem has witnessed extensive research with a wide variety of techniques developed for different execution environments, namelydistributed computing [9], [17], grid computing [13], [26], [27], and cloud computing [16], [19], [30], [44]. Initially, the focus of these works was on relational workloads such as database joins [17] and scientific workloads [14], [31], [45], however, recently the focus has shifted towards workloads emanating from specialized applications such as OSN services [21], [24] and data intensive services in geo-distributed clouds [1], [41]- [43], [47].…”
Section: Related Workmentioning
confidence: 99%
“…In this section, we present studies addressing the problem of data and computation placement in these workflows. The problem of data placement is a prominent problem to solve in scientific workflow systems [20]. It is a big challenge in biomedical analysis, making it necessary to minimize data transfer among distributed data centers.…”
Section: A Placement Of Computations and Datamentioning
confidence: 99%