Scott M. Sawyer scite author profile

O'Gwynn

Tran³

et al. 2013

Open-source, BigTable-like distributed databases provide a scalable storage solution for data-intensive applications. The simple key-value storage schema provides fast record ingest and retrieval, nearly independent of the quantity of data stored. However, real applications must support non-trivial queries that require careful key design and value indexing. We study an Apache Accumulo-based big data system designed for a network situational awareness application. The application's storage schema and data retrieval requirements are analyzed. We then characterize the corresponding Accumulo performance bottlenecks. Queries are shown to be communication-bound and server-bound in different situations. Inefficiencies in the opensource communication stack and filesystem limit network and I/O performance, respectively. Additionally, in some situations, parallel clients can contend for server-side resources. Maximizing data retrieval rates for practical queries requires effective key design, indexing, and client parallelization.

Geo-registering 3D point clouds to 2D maps with scan matching and the Hough Transform

Armstrong-Crews

2013

3D point cloud registration is traditionally done by aligning to known information. This information can be extracted from semantically labeled and geo-registered 2D images, e.g. maps, satellite images, and labeled aerial photos. We propose an automated method to geo-register 3D point clouds to 2D maps by defining a normalized Hough similarity function and aligning planes (i.e., walls) in 3D point clouds to lines in 2D maps. The collective set of algorithms solves for seven degrees of freedom: three rotation parameters (including the up vector), a scale value, and three translation parameters. After transforming the 3D point cloud into a manageable 2D representation, we apply existing and novel scan-matching techniques to align both query and reference representations.

Cluster-based 3D reconstruction of aerial video

Bliss

2012

Abstract-Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult to parallelize, is accomplished by partitioning the input image set, generating independent point clouds in parallel, and then fusing the clouds and combining duplicate points. We compare this processing chain to a leading parallel SfM implementation, which exploits fine-grained parallelism in various matrix operations and is not designed to scale beyond a multi-core workstation with GPU. We show our cluster-based approach offers significant improvement in scalability and runtime while producing comparable point cloud density and more accurate point location estimates.

Evaluating accumulo performance for a scalable cyber data processing pipeline

O'Gwynn

2014

Streaming, big data applications face challenges in creating scalable data flow pipelines, in which multiple data streams must be collected, stored, queried, and analyzed. These data sources are characterized by their volume (in terms of dataset size), velocity (in terms of data rates), and variety (in terms of fields and types). For many applications, distributed NoSQL databases are effective alternatives to traditional re lational database management systems. This paper considers a cyber situational awareness system that uses the Apache Accumulo database to provide scalable data warehousing, real time data ingest, and responsive querying for human users and analytic algorithms. We evaluate Accumulo's ingestion scalability as a function of number of client processes and servers. We also describe a flexible data model with effective techniques for query planning and query batching to deliver responsive results.Query performance is evaluated in terms of latency of the client receiving initial result sets. Accumulo performance is measured on a database of up to 8 nodes using real cyber data.

P-sync: A Photonically Enabled Architecture for Efficient Non-local Data Access

Whelihan

Hughes

et al. 2013