Open-source, BigTable-like distributed databases provide a scalable storage solution for data-intensive applications. The simple key-value storage schema provides fast record ingest and retrieval, nearly independent of the quantity of data stored. However, real applications must support non-trivial queries that require careful key design and value indexing. We study an Apache Accumulo-based big data system designed for a network situational awareness application. The application's storage schema and data retrieval requirements are analyzed. We then characterize the corresponding Accumulo performance bottlenecks. Queries are shown to be communication-bound and server-bound in different situations. Inefficiencies in the opensource communication stack and filesystem limit network and I/O performance, respectively. Additionally, in some situations, parallel clients can contend for server-side resources. Maximizing data retrieval rates for practical queries requires effective key design, indexing, and client parallelization.
3D point cloud registration is traditionally done by aligning to known information. This information can be extracted from semantically labeled and geo-registered 2D images, e.g. maps, satellite images, and labeled aerial photos. We propose an automated method to geo-register 3D point clouds to 2D maps by defining a normalized Hough similarity function and aligning planes (i.e., walls) in 3D point clouds to lines in 2D maps. The collective set of algorithms solves for seven degrees of freedom: three rotation parameters (including the up vector), a scale value, and three translation parameters. After transforming the 3D point cloud into a manageable 2D representation, we apply existing and novel scan-matching techniques to align both query and reference representations.
Abstract-Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult to parallelize, is accomplished by partitioning the input image set, generating independent point clouds in parallel, and then fusing the clouds and combining duplicate points. We compare this processing chain to a leading parallel SfM implementation, which exploits fine-grained parallelism in various matrix operations and is not designed to scale beyond a multi-core workstation with GPU. We show our cluster-based approach offers significant improvement in scalability and runtime while producing comparable point cloud density and more accurate point location estimates.
Streaming, big data applications face challenges in creating scalable data flow pipelines, in which multiple data streams must be collected, stored, queried, and analyzed. These data sources are characterized by their volume (in terms of dataset size), velocity (in terms of data rates), and variety (in terms of fields and types). For many applications, distributed NoSQL databases are effective alternatives to traditional re lational database management systems. This paper considers a cyber situational awareness system that uses the Apache Accumulo database to provide scalable data warehousing, real time data ingest, and responsive querying for human users and analytic algorithms. We evaluate Accumulo's ingestion scalability as a function of number of client processes and servers. We also describe a flexible data model with effective techniques for query planning and query batching to deliver responsive results.Query performance is evaluated in terms of latency of the client receiving initial result sets. Accumulo performance is measured on a database of up to 8 nodes using real cyber data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.