2014 IEEE 7th International Conference on Cloud Computing 2014
DOI: 10.1109/cloud.2014.65
|View full text |Cite
|
Sign up to set email alerts
|

Improving Hadoop Service Provisioning in a Geographically Distributed Cloud

Abstract: Abstract-With

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
14
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(14 citation statements)
references
References 11 publications
0
14
0
Order By: Relevance
“…We build datacenters geographically with the purpose of achieving low latencies for local users [7] . Nevertheless, as data volumes keep increasing at a tremendous rate, it is still time consuming to transfer such substantial amount of data across datacenters [8] . Many cloud services have very stringent requirements for latency, even a delay of one second can make a great difference [9] .…”
Section: Optimization Issuesmentioning
confidence: 99%
“…We build datacenters geographically with the purpose of achieving low latencies for local users [7] . Nevertheless, as data volumes keep increasing at a tremendous rate, it is still time consuming to transfer such substantial amount of data across datacenters [8] . Many cloud services have very stringent requirements for latency, even a delay of one second can make a great difference [9] .…”
Section: Optimization Issuesmentioning
confidence: 99%
“…As multi-cloud environments become more and more accessible, deploying MapReduce applications over multi-cloud is emerging [5], [6], [7], [8].…”
Section: Introductionmentioning
confidence: 99%
“…First, existing work on multi-cloud mainly focuses on maximizing throughput by improving data locality [5], [8], but the perspective of cost optimization is missing. To consider the cost and running time of applications doing the optimization process, we need to focus not only on data locality, but also on data transfer costs defined by multiple different cloud providers' cost models.…”
Section: Introductionmentioning
confidence: 99%
“…The following are few examples of applications that process geo-distributed datasets: climate science [8], [9], data generated by multinational companies [8], [10], [11], sensor networks [9], [12], stock exchanges [9], web crawling [13], [14], social networking applications [13], [14], biological data processing [8], [12], [15] such as DNA sequencing and human microbiome investigations, protein structure prediction, and molecular simulations, stream analysis [9], video feeds from distributed cameras, log files from distributed servers [12], geographical information systems (GIS) [4], and scientific applications [8], [9], [13], [16]. It should be noted down here that all the abovementioned applications generate a high volume of raw data across the globe; however, most analysis tasks require only a small amount of the original raw data for producing the final outputs or summaries [12].…”
mentioning
confidence: 99%
“…The main advantages of geo-distributed big-data processing are given in [33] and listed below: • A geo-distributed Hadoop/Spark-based system can perform data processing across nodes of multiple clusters while the standard Hadoop/Spark and their variants cannot process data at multiple clusters [33]. • More flexible services, e.g., resource sharing, load balancing, fault-tolerance, performance isolation, data isolation, and version isolation, can be achieved when a cluster is a part of a geo-distributed cluster [11], [16]. • A cluster can be scaled dynamically during the execution of a geo-distributed computation [33].…”
mentioning
confidence: 99%