Understanding query performance in Accumulo

Sawyer, Scott M.; O'Gwynn, B. David; Tran, An Vu; Yu, Tamara

doi:10.1109/hpec.2013.6670330

Cited by 11 publications

(5 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The strategy in [14] also used tablet location information to determine where clients could write locally. Knowing tabletto-tablet-server assignment could likewise aid Graphulo, not only to minimize network traffic but also to partly eliminate Apache Thrift RPC serialization, which prior work has shown is a bottleneck for scans when iterator processing is light [18]. Such an enhancement would access a local tablet server by method call in place of Scanners and BatchWriters.…”

Section: A Related Workmentioning

confidence: 99%

Graphulo implementation of server-side sparse matrix multiply in the Accumulo database

Hutchison

Kepner

Gadepally

et al. 2015

2015 IEEE High Performance Extreme Computing Conference (HPEC)

View full text Add to dashboard Cite

Abstract-The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we discuss a serverside implementation of GraphBLAS sparse matrix multiplication that leverages Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.

show abstract

Section: A Related Workmentioning

confidence: 99%

Graphulo implementation of server-side sparse matrix multiply in the Accumulo database

Hutchison

Kepner

Gadepally

et al. 2015

2015 IEEE High Performance Extreme Computing Conference (HPEC)

View full text Add to dashboard Cite

show abstract

“…However it is some research seems to suggests that this problem can be easily bypassed using a big data structure. The problem of the storage size and the query performances could also be explored better even if some research already been conducted [23].…”

Section: Performance Analysismentioning

confidence: 99%

Implementing Suffix Array Algorithm Using Apache Big Table Data Implementation

Giacomelli¹

2020

Preprint

View full text Add to dashboard Cite

In this paper we will describe a new approach on the well-known suffix-array algorithm using Big Table Data Technology. We will demonstrate how it is possible to refactor a well-known algorithm coupled by taking advantage of an highperformance distributed datastore, to illustrate the advantages of using datastore cloud related technology for storing large text sequences and retrieving them. A case study using DNA strings, considered one of the most difficult pattern matching problem, will be described and evaluated to demonstrate the potentiality of this implementation. Further discussion on performances and other big data related issues will be described as well as new possible lines of research in big data technology for precise medical applications.

show abstract

“…The peak insert rate for a single thread is typically ~100,000 entries per second. A typical single node server can reach ~500,000 entries per second using several insert threads [Sawyer 2013]. For the hypothetical Hadoop cluster described in the previous section, the peak performance would be ~100,000,000 entries per second.…”

Section: Accumulomentioning

confidence: 99%

“…IV. ACCUMULO Relational or SQL (Structured Query Language) databases[Codd 1970, Stonebraker et al 1976 have been the de facto interface to databases since the 1980s and are the bedrock of electronic transactions around the world. More recently, keyvalue stores (NoSQL databases)[Chang et al 2008] have been developed for representing large sparse tables to aid in the analysis of data for Internet search.…”

mentioning

confidence: 99%

Lustre, hadoop, accumulo

Kepner

Arcand

Bestor

et al. 2015

2015 IEEE High Performance Extreme Computing Conference (HPEC)

View full text Add to dashboard Cite

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies.This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster.These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10 5 lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.

show abstract

Understanding query performance in Accumulo

Cited by 11 publications

References 7 publications

Graphulo implementation of server-side sparse matrix multiply in the Accumulo database

Graphulo implementation of server-side sparse matrix multiply in the Accumulo database

Implementing Suffix Array Algorithm Using Apache Big Table Data Implementation

Lustre, hadoop, accumulo

Contact Info

Product

Resources

About