Multi-dimensional Index on Hadoop Distributed File System

Liao, Hao; Han, Jizhong; Fang, Jinyun

doi:10.1109/nas.2010.44

Cited by 62 publications

(24 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To run any of the batch processing jobs which is distributed across the cluster the appropriate program should be written in map reduce. The map reduce paradigm takes care of scheduling jobs, monitoring jobs, allocating resources, mining jobs and managing failures 6 . Job tracker is the core service or the thread running all the time which is used for scheduling the jobs on the data nodes and to monitor these tasks.…”

Section: Framework Of Hadoopmentioning

confidence: 99%

“…If the resources are available in the task tracker, then the jobs are allotted else either the jobs are in waiting state until the resources are freed up or the jobs are fragmented depending on the required size and then jobs are allotted to each slot. Resource Aware Programming in Hadoop has prettify one of the explore Challenges [5][6] in Cloud Computing. Programming in Hadoop is centralized, and initiated.…”

Section: Resource Aware Schedulermentioning

confidence: 99%

See 1 more Smart Citation

Improved fair Scheduling Algorithm for Hadoop Clustering

Sneha

Sebastian

2017

Orient. J. Chem

View full text Add to dashboard Cite

Traditional way of storing such a huge amount of data is not convenient because processing those data in the later stages is very tedious job. So nowadays, Hadoop is used to store and process large amount of data. When we look at the statistics of data generated in the recent years it is very high in the last 2 years. Hadoop is a good framework to store and process data efficiently. It works like parallel processing and there is no failure or data loss as such due to fault tolerance. Job scheduling is an important process in Hadoop Map Reduce. Hadoop comes with three types of schedulers namely FIFO (First in first out), Fair and Capacity Scheduler. The schedulers are now a pluggable component in the Hadoop Map Reduce framework. This paper talks about the native job scheduling algorithms in Hadoop. Fair scheduling algorithm is analysed with its algorithm considering its response time, throughput and performance. Advantages and drawbacks of fair scheduling algorithm is discussed. Improvised fair scheduling algorithm is proposed with new strategy. Analysis is made with respect to response time, throughput and performance is calculated in naive fair scheduling and improvised fair scheduling. Improvised fair Scheduling algorithms is used in the cases where there is jobs with high and less processing time.

show abstract

Section: Framework Of Hadoopmentioning

confidence: 99%

Section: Resource Aware Schedulermentioning

confidence: 99%

Improved fair Scheduling Algorithm for Hadoop Clustering

Sneha

Sebastian

2017

Orient. J. Chem

View full text Add to dashboard Cite

show abstract

“…The first category handles high selectivity queries, such as selection queries and kNN queries, in which only a small portion of spatial objects are returned as the result of spatial query processing. A few techniques have been proposed to process the high selectivity queries in HDFS [5,6]. They are utilizing popular spatial indices such as an R-tree and its variants.…”

Section: Related Workmentioning

confidence: 99%

“…Several techniques have been proposed to support spatial queries on Hadoop MapReduce [7,11,4,12] or HDFS [5,6]. However, most of them require internal modification of underlying systems or frameworks to implement their indexing techniques based on, for example, R-trees.…”

Section: Introductionmentioning

confidence: 99%

Efficient spatial query processing for big data

Lee

Ganti

Srivatsa

et al. 2014

Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

View full text Add to dashboard Cite

Spatial queries are widely used in many data mining and analytics applications. However, a huge and growing size of spatial data makes it challenging to process the spatial queries efficiently. In this paper we present a lightweight and scalable spatial index for big data stored in distributed storage systems. Experimental results show the efficiency and effectiveness of our spatial indexing technique for different spatial queries.

show abstract

“…The partitioning function puts objects in the same partition to keep spatial proximity by using the sorted minimum boundary rectangle (MBR) values of object nodes from the Hilbert-curve, and transforms them into a standard and proven multi-dimensional index structure-R-Tree-through parallelization in MapReduce. Hilbert packing reduces the data transfer overhead through the network and thersefore the query response time [30]. Similar to the Z-curve, boundary objects that overlap in more than one partition are assigned to the maximal overlap partition.…”

Section: Introductionmentioning

confidence: 99%

2DPR-Tree: Two-Dimensional Priority R-Tree Algorithm for Spatial Partitioning in SpatialHadoop

Elashry¹,

Shehab²,

Riad³

et al. 2018

IJGI

View full text Add to dashboard Cite

Among spatial information applications, SpatialHadoop is one of the most important systems for researchers. Broad analyses prove that SpatialHadoop outperforms the traditional Hadoop in managing distinctive spatial information operations. This paper presents a Two Dimensional Priority R-Tree (2DPR-Tree) as a new partitioning technique in SpatialHadoop. The 2DPR-Tree employs a top-down approach that effectively reduces the number of partitions accessed to answer the query, which in turn improves the query performance. The results were evaluated in different scenarios using synthetic and real datasets. This paper aims to study the quality of the generated index and the spatial query performance. Compared to other state-of-the-art methods, the proposed 2DPR-Tree improves the quality of the generated index and the query execution time.

show abstract

Multi-dimensional Index on Hadoop Distributed File System

Cited by 62 publications

References 28 publications

Improved fair Scheduling Algorithm for Hadoop Clustering

Improved fair Scheduling Algorithm for Hadoop Clustering

Efficient spatial query processing for big data

2DPR-Tree: Two-Dimensional Priority R-Tree Algorithm for Spatial Partitioning in SpatialHadoop

Contact Info

Product

Resources

About