Lessons Learned with Laser Scanning Point Cloud Management in Hadoop HBase

Vo, Anh Vu; Konda, Nikita; Chauhan, Neel; Aljumaily, Harith; Laefer, Debra F.

doi:10.1007/978-3-319-91635-4_13

Cited by 10 publications

(10 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The data encoding is a component within a complete data storage system that integrates the encoding with other components including data indices, search algorithms, and cache strategies. Readers may consult the authors' previous works for information explicitly on those other topics [6], [7]).…”

Section: Introductionmentioning

confidence: 99%

Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system

Hewage

Russo

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

This paper introduces a novel LiDAR point cloud data encoding solution that is compact, flexible, and fully supports distributed data storage within the Hadoop distributed computing environment. The proposed data encoding solution is developed based on Sequence File and Google Protocol Buffers. Sequence File is a generic splittable binary file format built in the Hadoop framework for storage of arbitrary binary data. The key challenge in adopting the Sequence File format for LiDAR data is in the strategy for effectively encoding the LiDAR data as binary sequences in a way that the data can be represented compactly, while allowing necessary mutation. For that purpose, a data encoding solution, based on Google Protocol Buffers (a language-neutral, cross-platform, extensible data serialisation framework) was developed and evaluated. Since neither of the underlying technologies is sufficient to completely and efficiently represent all necessary point formats for distributed computing, an innovative fusion of them was required to provide a viable data storage solution. This paper presents the details of such a data encoding implementation and rigorously evaluates the efficiency of the proposed data encoding solution. Benchmarking was done against a straightforward, naive text encoding implementation using a high-density aerial LiDAR scan of a portion of Dublin, Ireland. The results demonstrated a 6-times reduction in data volume, a 4-times reduction in database ingestion time, and up to a 5 times reduction in querying time.

show abstract

Section: Introductionmentioning

confidence: 99%

Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system

Hewage

Russo

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Nevertheless, SQL-on-Hadoop-based system have received much attention [48,49] for they provide the benefits of SQL in querying the data. However, Vo et al [17] note how Hadoop-based approaches work best when the task clearly lends itself to be run in parallel, such as treating a point cloud as a group of independent tiles. Moreover, as described in a benchmark [50], though offering better performance and scalability in querying a point cloud than PostgreSQL, the authors found out that configuring an SQL-on-Hadoop-based system (Spark SQL) for optimal performance can be daunting and indeed requires lots of memory.…”

Section: The Big Data Approachmentioning

confidence: 99%

“…Since we cannot rely on being able to keep the whole point cloud in memory, the main issue with point clouds remains that of indexing: how can we keep the index small enough, so that at least parts of it fits into memory? With RDBMS, we can reduce the index granularity [17] through the previously mentioned grouping of points into blocks or patches, but as pointed out in References [15,56], there is a cost to pay. Regarding queries, the implication is that individual points in a block are invisible to the query processor until the block is exploded into the original individual points.…”

Section: On Indexing Multidimensional Data: the B + Tree Indexmentioning

confidence: 99%

“…Since the SMALL dataset contained building IDs, two additional point queries (Q 1A and Q 1B ) that referred to two specific building IDs were used for this dataset, as shown in Table 2. We note in passing that while the authors in Reference [17] use the term point query to denote a query that retrieves a single point and its properties from the point cloud, we use it in the more broader spatial query sense to simply denote a query that requires an exact match (as opposed to a given range) for a certain attribute.…”

Section: The Queries That Were Usedmentioning

confidence: 99%

“…We note that as pointed out in Reference [16], since point cloud data is not associated with a regular grid, it can be classified as unstructured data. Furthermore, point clouds can be characterized as being mostly static data sets [17]; once processed and cleaned, there is little need for modifying or updating them. This absence of a dynamic dataset means that the data processing needs of point clouds are very different from an intensive transaction based processing involving lots of updates [18].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Simple Semantic-Based Data Storage Layout for Querying Point Clouds

El-Mahgary

Virtanen

Hyyppä

2020

IJGI

View full text Add to dashboard Cite

The importance of being able to separate the semantics from the actual (X,Y,Z) coordinates in a point cloud has been actively brought up in recent research. However, there is still no widely used or accepted data layout paradigm on how to efficiently store and manage such semantic point cloud data. In this paper, we present a simple data layout that makes use the semantics and that allows for quick queries. The underlying idea is especially suited for a programming approach (e.g., queries programmed via Python) but we also present an even simpler implementation of the underlying technique on a well known relational database management system (RDBMS), namely, PostgreSQL. The obtained query results suggest that the presented approach can be successfully used to handle point and range queries on large points clouds.

show abstract

Harnessing Remote Sensing for Civil Engineering: Then, Now, and Tomorrow

Laefer

2019

Lecture Notes in Civil Engineering

View full text Add to dashboard Cite

Lessons Learned with Laser Scanning Point Cloud Management in Hadoop HBase

Cited by 10 publications

References 24 publications

Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system

Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system

A Simple Semantic-Based Data Storage Layout for Querying Point Clouds

Harnessing Remote Sensing for Civil Engineering: Then, Now, and Tomorrow

Contact Info

Product

Resources

About