2006
DOI: 10.1145/1132952.1132955
|View full text |Cite
|
Sign up to set email alerts
|

A taxonomy of Data Grids for distributed data sharing, management, and processing

Abstract: Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
187
0
2

Year Published

2007
2007
2016
2016

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 273 publications
(189 citation statements)
references
References 118 publications
0
187
0
2
Order By: Relevance
“…Thus, a different approach is preferred, where computation is moved to where the data is. The same approach of exploring data locality was explored previously in scientific workflows [39] and in Data Grids [40] In the context of Big Data analytics, MapReduce presents an interesting model where data locality is explored to improve the performance of applications. Hadoop, an open source MapReduce implementation, allows for the creation of clusters that use the Hadoop Distributed File System (HDFS) to partition and replicate datasets to nodes where they are more likely to be consumed by mappers.…”
Section: Data Storagementioning
confidence: 96%
“…Thus, a different approach is preferred, where computation is moved to where the data is. The same approach of exploring data locality was explored previously in scientific workflows [39] and in Data Grids [40] In the context of Big Data analytics, MapReduce presents an interesting model where data locality is explored to improve the performance of applications. Hadoop, an open source MapReduce implementation, allows for the creation of clusters that use the Hadoop Distributed File System (HDFS) to partition and replicate datasets to nodes where they are more likely to be consumed by mappers.…”
Section: Data Storagementioning
confidence: 96%
“…In distributed environments, besides other computations, the access to distributed data resources and their management are also treated as a vital functionality [21,31]. The main functions of data management system in a distributed environment include: Besides these functions, the system is also required to have: the ability to search through abundant available data sets, the ability to discover suitable resources hosting data sets and the ability to allow resource owners to grant permission to access their data resources [46].…”
Section: Data Managementmentioning
confidence: 99%
“…DGs support data intensive applications. Among several types of Data Grids elements four components seem to be fundamental, namely Grid Organization module, Data Replication mechanism, Data Transfer policy and infrastructure and Scheduling module (see also [50]) as shown in Fig. 1.…”
Section: A Short Taxonomy Of Data-aware Scheduling Problems In Data Gmentioning
confidence: 99%
“…The complex hierarchy of the DG can be then organized as a collection of four sub-hierarchies, each of them dedicated to one of the DG's elements. Such complex DG characteristics are presented in [50]. In fact, each of the areas of data transport, replica management and resource management pose challenging research issues and can be analyzed as independent research areas.…”
Section: A Short Taxonomy Of Data-aware Scheduling Problems In Data Gmentioning
confidence: 99%