A Neural State Pushdown Automata

Mali, Ankur; Ororbia, Alexander G.; Giles, C. Lee

doi:10.1109/tai.2021.3055167

Cited by 8 publications

(6 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We implemented several variations of the recurrent state-function s k for the estimator described above. In preliminary experiments, we found that the LSTM state function and the RNN-SNE (a more expensive, but expensive extension of our estimator [27]) yielded the most consistent performance. Therefore, we report the performance using an LSTM as a state cell for all algorithms and RNN-SNE using BPTT and SAB (since we found that SAB worked best when using LSTM state functions) 1 .…”

Section: Methodsmentioning

confidence: 88%

“…Iterative Refinement: This procedure can be seen as locally decoding data process aimed at improving the memory retention ability of recurrent neural networks (RNNs) [1,27,2]. The neural decoder used with this process essentially reconstructs images from a compressed representation and iterative refinement formulates compression as a multi-step reconstruction problem over a finite number of passes, K. Consider a 2D image I and decompose it into a set of P image patches, or I = {p 1 , • • • , p j , • • • , p P } (i.e non-overlapping for JPEG, overlapping for JP2).…”

Section: Hybrid Nonlinear Estimator For Iterative Decodingmentioning

confidence: 99%

See 1 more Smart Citation

An Empirical Analysis of Recurrent Learning Algorithms In Neural Lossy Image Compression Systems

Mali¹,

Ororbia²,

Kifer³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent advances in deep learning have resulted in image compression algorithms that outperform JPEG and JPEG 2000 on the standard Kodak benchmark. However, they are slow to train (due to backprop-through-time) and, to the best of our knowledge, have not been systematically evaluated on a large variety of datasets. In this paper, we perform the first large scale comparison of recent state-of-the-art hybrid neural compression algorithms, while exploring the effects alternative training strategies (when applicable). The hybrid recurrent neural decoder is a former state-of-the-art model (recently overtaken by a Google model) that can be trained using backprop-through-time (BPTT) or with alternative algorithms like sparse attentive backtracking (SAB), unbiased online recurrent optimization (UORO), and real time recurrent learning (RTRL). We compare these training alternatives along with the Google models (GOOG and E2E) on 6 benchmark datasets. Surprisingly, we found that the model trained with SAB performs the better (outperforming even BPTT), resulting in faster convergence and better peak signal-to-noise ratio.

show abstract

Section: Methodsmentioning

confidence: 88%

Section: Hybrid Nonlinear Estimator For Iterative Decodingmentioning

confidence: 99%

An Empirical Analysis of Recurrent Learning Algorithms In Neural Lossy Image Compression Systems

Mali¹,

Ororbia²,

Kifer³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Prior work [10,52,41] has shown that initializing a network with prior knowledge can yield improved generalization while training. In such cases, the weights of the network which are not programmed (ie.…”

Section: Discussionmentioning

confidence: 99%

“…Many neural network models take the form of a first order (in weights) recurrent neural network (RNN) and have been taught to learn context free and context-sensitive counter languages [17,9,5,64,70,56,48,66,8,36,8,67]. However, from a theoretical perspective, RNNs augmented with an external memory have historically been shown to be more capable of recognizing context free languages (CFLs), such as with a discrete stack [10,55,61], or, more recently, with various differentiable memory structures [33,26,24,39,73,28,72,25,40,41,3,42]. Despite positive results, prior work on CFLs was unable to achieve perfect generalization on data beyond the training dataset, highlighting a troubling difficulty in preserving long term memory.…”

Section: Related Workmentioning

confidence: 99%

A provably stable neural network Turing Machine

Stogin¹,

Mali²,

Giles³

2020

Preprint

View full text Add to dashboard Cite

Given a collection of strings belonging to a context free grammar (CFG) and another collection of strings not belonging to the CFG, how might one infer the grammar? This is the problem of grammatical inference. Since CFGs are the languages recognized by pushdown automata (PDA), it suffices to determine the state transition rules and stack action rules of the corresponding PDA. An approach would be to train a recurrent neural network (RNN) to classify the sample data and attempt to extract these PDA rules. But neural networks are not a priori aware of the structure of a PDA and would likely require many samples to infer this structure. Furthermore, extracting the PDA rules from the RNN is nontrivial. We build a RNN specifically structured like a PDA, where weights correspond directly to the PDA rules. This requires a stack architecture that is somehow differentiable (to enable gradient-based learning) and stable (an unstable stack will show deteriorating performance with longer strings). We propose a stack architecture that is differentiable and that provably exhibits orbital stability. Using this stack, we construct a neural network that provably approximates a PDA for strings of arbitrary length. Moreover, our model and method of proof can easily be generalized to other state machines, such as a Turing Machine. * Equal contribution Preprint. Under review.

show abstract

“…Hierarchical RNNs, such as the Clockwork RNN [14], Phased LSTM [16], and Hierarchical Multiscale RNN [3], solve this limitation by modifying the architecture to easily encode long-term dependencies in the hidden state. Most of the effort of the literature focus on architectural modifications [3,15,16]. Another line of research explores the use of online algorithms to train RNNs [18,?,23].…”

Section: Related Workmentioning

confidence: 99%

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

Carta¹,

Sperduti²,

Bacciu³

2020

Preprint

View full text Add to dashboard Cite

The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. First, we show how to extend the architecture of a simple RNN by separating its hidden state into different modules, each subsampling the network hidden activations at different frequencies. Then, we discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies. Each new module works at a slower frequency than the previous ones and it is initialized to encode the subsampled sequence of hidden activations. Experimental results on synthetic and real-world datasets on speech recognition and handwritten characters show that the modular architecture and the incremental training algorithm improve the ability of recurrent neural networks to capture long-term dependencies.

show abstract

A Neural State Pushdown Automata

Cited by 8 publications

References 41 publications

An Empirical Analysis of Recurrent Learning Algorithms In Neural Lossy Image Compression Systems

An Empirical Analysis of Recurrent Learning Algorithms In Neural Lossy Image Compression Systems

A provably stable neural network Turing Machine

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

Contact Info

Product

Resources

About