2009
DOI: 10.1098/rsta.2009.0053
|View full text |Cite
|
Sign up to set email alerts
|

Sector and Sphere: the design and implementation of a high-performance data cloud

Abstract: Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0
1

Year Published

2010
2010
2017
2017

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 116 publications
(52 citation statements)
references
References 18 publications
0
51
0
1
Order By: Relevance
“…An alternative approach is to fuse the distributed file system and processing engine into a single, tightly coupled component. This philosophy is characteristic of parallel databases, and is also embraced by others, for example in the twin systems Sector and Sphere [46]. These closely integrate the mechanisms for data processing with the storage layer, by offering the capability of evaluating user-defined functions locally on storage nodes.…”
Section: Alternative and Hybrid Architecturesmentioning
confidence: 99%
“…An alternative approach is to fuse the distributed file system and processing engine into a single, tightly coupled component. This philosophy is characteristic of parallel databases, and is also embraced by others, for example in the twin systems Sector and Sphere [46]. These closely integrate the mechanisms for data processing with the storage layer, by offering the capability of evaluating user-defined functions locally on storage nodes.…”
Section: Alternative and Hybrid Architecturesmentioning
confidence: 99%
“…Recently, there have been a number of implementations of MapReduce and similar data processing tools [21,25,36,45,59,73,87]. Apache Hadoop was the most popular implementation of MapReduce at the start of the Magellan project and it continuous to gain traction in various communities.…”
Section: Mapreduce Programming Modelmentioning
confidence: 99%
“…However, this post-processing phase can be very expensive since the output prior to filtering can become much larger than the final output; for instance, on the wiki-talk-3 graph the first enumeration phase takes 7 min (on 20 processors), and the second post-processing phase takes 228 min (on 80 processors). The algorithm is implemented for the Sector/Sphere [16] framework.…”
Section: Related Workmentioning
confidence: 99%