Micron's new Automata Processor (AP) architecture exploits the very high and natural level of parallelism found in DRAM technologies to achieve native-hardware implementation of nondeterministic finite automata (NFAs). The use of DRAM technology to implement the NFA states provides high capacity and therefore provide extraordinary parallelism for pattern recognition. In this paper, we give an overview of AP's architecture, programming and applications.
We study compression-aware algorithms, i.e. algorithms that can exploit regularity in their input data by directly operating on compressed data. While popular with string algorithms, we consider this idea for algorithms operating on numeric sequences and graphs that have been compressed using a variety of schemes including LZ77, grammar-based compression, a graph interpretation of Re-Pair, and a method presented by Boldi and Vigna in The WebGraph Framework. In all cases, we discover algorithms outperforming a trivial approach: to decompress the input and run a standard algorithm. We aim to develop an algorithmic toolkit for basic tasks to operate on a variety of compression inputs. Algorithms for Compressed SequencesWe consider sorting algorithms that operate on data produced by the following three compression schemes: LZ77, context free grammar representation (called CFG), and LZ78. Note that in CFG an array is represented as the singleton language parsed from a grammar.For sorting an LZ77-compressed sequence of numbers, we present a sorting algorithm which operates in time O(C + |Σ| log |Σ| + n) where C is the compression size, n is the length of the sequence, and Σ is the set of unique numbers in the input list. In most instances C n, thus our algorithm in practice achieves linear sorting as compared to the classical algorithm's O(n log n) worst-case performance. We also present a way of indexing into the sequence in O(C) time.For sorting a list compressed by a context-free grammar (with LZ78 as a special case) we present an algorithm which finds the sorted sequence in O(C · |Σ|) time. Here, C represents the total number of symbols in all of the grammar's substitution rules. This result has the advantage of being independent of the size of the uncompressed list. From here, we can produce a grammar for the sorted list which has size O(|Σ| log n), where n is the length of the decompressed list. The classical approach would require O(n log n) time to decompress and then sort.Algorithms for Compressed Graphs Next we consider topological sort and bipartite assignment on graphs under these two compression schemes: a graph interpretation of the Re-Pair compression scheme, and the scheme presented by Boldi and Vigna in the WebGraph Framework (called BV).For graphs compressed using the Re-Pair algorithm, we perform both algorithms in O(C) time. Re-Pair is a form of grammar compression, so C is the number of terms on the right side of the grammar's parse rules. These improvements compare favorably with the O(|V| + |E|) trivial approach.We present an algorithm which performs bipartite checking on a BV compressed graph. This algorithm runs in O(|V| + s) time where |V| is the number of vertices in the graph. In a graph's adjacency list, after vertex labels are sorted, there are often blocks of sequential labels. The BV compression scheme represents these sequences en masse, and s is the total number of such sequences. This improves the running time of the classical O(|V| + |E|) approach.
While massive datasets are often stored in compressed format, most algorithms are designed to operate on uncompressed data. We address this growing disconnect by developing a framework for compression-aware algorithms that operate directly on compressed datasets. Synergistically, we also propose new algorithmicallyaware compression schemes that enable algorithms to efficiently process compressed data. In particular, we apply this general methodology to geometric / CAD datasets that are ubiquitous in areas such as graphics, VLSI, and geographic information systems. We develop algorithms and corresponding compression schemes that address different types of datasets, including pointsets and graphs. Our methods are more efficient than their classical counterparts, and they extend to both lossless and lossy compression scenarios. This motivates further investigation of how this approach can enable algorithms to process ever-increasing big data volumes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.