Efficient B-tree based indexing for cloud data processing

Wu, Sai; Jiang, Dawei; Ooi, Beng Chin; Wu, Kun-Lung

doi:10.14778/1920841.1920991

Cited by 132 publications

(63 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Several studies [15][16][17][18][19][20][21][22][23] focusing on efficient indexes in cloud storage systems have been conducted. The study in [15] proposed a Trojan index to improve runtime performance.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic multidimensional index for large-scale cloud data

Dong

et al. 2016

J Cloud Comp

View full text Add to dashboard Cite

Although several cloud storage systems have been proposed, most of them can provide highly efficient point queries only because of the key-value pairs storing mechanism. For these systems, satisfying complex multi-dimensional queries means scanning the whole dataset, which is inefficient. In this paper, we propose a multidimensional index framework, based on the Skip-list and Octree, which we refer to as Skip-Octree. Using a randomized skip list makes the hierarchical Octree structure easier to implement in a cloud storage system. To support the Skip-Octree, we also propose a series of index operation algorithms including range query algorithm, index maintenance algorithms, and dynamic index scaling algorithms. Through experimental evaluation, we show that the Skip-Octree index is feasible and efficient.

show abstract

Section: Related Workmentioning

confidence: 99%

“…However, it consumes considerable memory space to cache index information in the client, and it is unsuitable for processing multidimensional queries. The studies in [18,19] proposed an improved B+ tree index. This solution adopt a doublelayer index framework.…”

Section: Related Workmentioning

confidence: 99%

Dynamic multidimensional index for large-scale cloud data

Dong

et al. 2016

J Cloud Comp

View full text Add to dashboard Cite

show abstract

“…However, none of these methods provides real-time OLAP functionality. There are various publications on distributed B-trees for cloud platforms such as [29]. However, these method only supports 1-dimensional indices which are insufficient for OLAP queries.…”

Section: Related Workmentioning

confidence: 99%

A distributed tree data structure for real-time OLAP on cloud architectures

Dehne

Kong

Rau-Chaplin

et al. 2013

2013 IEEE International Conference on Big Data

View full text Add to dashboard Cite

Abstract-In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as "report the total sales in all stores located in California and New York during the months February-May of all years". We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.

show abstract

“…Therefore, the hash function becomes the main index of data, and the required data can be quickly accessed according to the hash value of keys [5,6,7]. However, in addition to the data query via keys, users also turn to other properties for point search or range search [8]. For example, in an online video system (such as Youtube [9]), each video contains a variety of information, including video ID, program name, upload time, times of plays.…”

Section: Introductionmentioning

confidence: 99%

“…At present, in the cloud computing environment, inverted index, the commonly used secondary index, can scan all storage nodes by multiple MapReduce [10] processes and generate inverted files. Inverted index is an off-line batch process, and it cannot realize timely query of newly inserted data [8]. For example, the record inserted into Google Base cannot be accessed by users until it is re-indexed next time (maybe one day later).…”

Section: Introductionmentioning

confidence: 99%

An Efficient Distributed B-tree Index Method in Cloud Computing

Huang¹,

Peng²

2015

TOCSJ

View full text Add to dashboard Cite

Abstract:To support online index and range queries, the Distributed B-tree is adopted to index the mass and rapidly increasing data in cloud computing. But current Distributed B-tree has three defects: low degree of concurrency, frequent node splitting and high cost of updates in clients. For above mentioned defects, this paper presents efficient distribute Btree index in cloud computing environment, which effectively enhances the performance of the distributed B-tree index. First, it improves concurrent access by the distributed B-tree high concurrency access method based on node split history. Second, it reduces the splitting frequency by the method of dynamic changing node size. Finally, it reduces node update cost in all client buffers by the regional delayed update method. Experimental results show that, this method has high performance in cloud computing environments.

show abstract

Efficient B-tree based indexing for cloud data processing

Cited by 132 publications

References 18 publications

Dynamic multidimensional index for large-scale cloud data

Dynamic multidimensional index for large-scale cloud data

A distributed tree data structure for real-time OLAP on cloud architectures

An Efficient Distributed B-tree Index Method in Cloud Computing

Contact Info

Product

Resources

About