2010 IEEE Fifth International Conference on Networking, Architecture, and Storage 2010
DOI: 10.1109/nas.2010.44
|View full text |Cite
|
Sign up to set email alerts
|

Multi-dimensional Index on Hadoop Distributed File System

Abstract: In this paper, we present an approach to construct a built-in block-based hierarchical index structures, like Rtree, to organize data sets in one, two, or higher dimensional space and improve the query performance towards the common query types (e.g., point query, range query) on Hadoop distributed file system (HDFS). The query response time for data sets that are stored in HDFS can be significantly reduced by avoiding exhaustive search on the corresponding data sets in the presence of index structures. The ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 62 publications
(24 citation statements)
references
References 28 publications
0
24
0
Order By: Relevance
“…To run any of the batch processing jobs which is distributed across the cluster the appropriate program should be written in map reduce. The map reduce paradigm takes care of scheduling jobs, monitoring jobs, allocating resources, mining jobs and managing failures 6 . Job tracker is the core service or the thread running all the time which is used for scheduling the jobs on the data nodes and to monitor these tasks.…”
Section: Framework Of Hadoopmentioning
confidence: 99%
See 1 more Smart Citation
“…To run any of the batch processing jobs which is distributed across the cluster the appropriate program should be written in map reduce. The map reduce paradigm takes care of scheduling jobs, monitoring jobs, allocating resources, mining jobs and managing failures 6 . Job tracker is the core service or the thread running all the time which is used for scheduling the jobs on the data nodes and to monitor these tasks.…”
Section: Framework Of Hadoopmentioning
confidence: 99%
“…If the resources are available in the task tracker, then the jobs are allotted else either the jobs are in waiting state until the resources are freed up or the jobs are fragmented depending on the required size and then jobs are allotted to each slot. Resource Aware Programming in Hadoop has prettify one of the explore Challenges [5][6] in Cloud Computing. Programming in Hadoop is centralized, and initiated.…”
Section: Resource Aware Schedulermentioning
confidence: 99%
“…The first category handles high selectivity queries, such as selection queries and kNN queries, in which only a small portion of spatial objects are returned as the result of spatial query processing. A few techniques have been proposed to process the high selectivity queries in HDFS [5,6]. They are utilizing popular spatial indices such as an R-tree and its variants.…”
Section: Related Workmentioning
confidence: 99%
“…Several techniques have been proposed to support spatial queries on Hadoop MapReduce [7,11,4,12] or HDFS [5,6]. However, most of them require internal modification of underlying systems or frameworks to implement their indexing techniques based on, for example, R-trees.…”
Section: Introductionmentioning
confidence: 99%
“…The partitioning function puts objects in the same partition to keep spatial proximity by using the sorted minimum boundary rectangle (MBR) values of object nodes from the Hilbert-curve, and transforms them into a standard and proven multi-dimensional index structure-R-Tree-through parallelization in MapReduce. Hilbert packing reduces the data transfer overhead through the network and thersefore the query response time [30]. Similar to the Z-curve, boundary objects that overlap in more than one partition are assigned to the maximal overlap partition.…”
Section: Introductionmentioning
confidence: 99%