Memory in linear recurrent neural networks in continuous time

Hermans, Michiel; Schrauwen, Benjamin

doi:10.1016/j.neunet.2009.08.008

Cited by 96 publications

(90 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Ref. [10] gave closed-form expressions for memory function of continuous-time linear recurrent networks in terms of the autocorrelation function of the input, and closely studied the case of an exponential autocorrelation function. * semarzen@mit.edu Ref.…”

mentioning

confidence: 99%

Difference between memory and prediction in linear recurrent networks

Marzen

2017

Phys. Rev. E

View full text Add to dashboard Cite

Recurrent networks are trained to memorize their input better, often in the hopes that such training will increase the ability of the network to predict. We show that networks designed to memorize input can be arbitrarily bad at prediction. We also find, for several types of inputs, that one-node networks optimized for prediction are nearly at upper bounds on predictive capacity given by Wiener filters, and are roughly equivalent in performance to randomly generated five-node networks. Our results suggest that maximizing memory capacity leads to very different networks than maximizing predictive capacity, and that optimizing recurrent weights can decrease reservoir size by half an order of magnitude. Often, we remember for the sake of prediction. Such is the case, it seems, in the field of echo state networks (ESNs) [1,2]. ESNs are large input-dependent recurrent networks in which a "readout layer" is trained to match a desired output signal from the present network state. Sometimes, the desired output signal is the past or future of the input to the network.If the recurrent networks are large enough, they should have enough information about the past of the input signal to reproduce a past input or predict a future input well, and only the readout layer need be trained. Still, the weights and structure of the recurrent network can greatly affect the predictive capabilities of the recurrent network, and so many researchers are now interested in optimizing the network itself to maximize task performance [3].Much of the theory surrounding echo state networks centers on memorizing white noise, an input for which memory is essentially useless for prediction [4]. This leads to a rather practical question: how much of the theory surrounding optimal reservoirs, based on maximizing memory capacity [5][6][7][8][9], is misleading if the ultimate goal is to maximize predictive power?We study the difference between optimizing for memory and optimizing for prediction in linear recurrent networks subject to scalar temporally-correlated input generated by countable Hidden Markov models. Ref.[10] gave closed-form expressions for memory function of continuous-time linear recurrent networks in terms of the autocorrelation function of the input, and closely studied the case of an exponential autocorrelation function. * semarzen@mit.edu Ref.[11] gave similar expressions for discrete-time linear recurrent networks. Ref.[12] gave closed-form expressions for the Fisher memory curve of discrete-time linear recurrent networks, which measure how much changes in input signal perturb the network state; for linear recurrent networks, this curve is independent of the particular input signal.We differ from these previous efforts mostly in that we study both memory capacity and newly-defined "predictive capacity". We derive an upper bound for predictive capacity via Wiener filters in terms of the autocorrelation function of the input. Two surprising findings result. First, predictive capacity is not typically maximized at the "edge of critical...

show abstract

mentioning

confidence: 99%

Difference between memory and prediction in linear recurrent networks

Marzen

2017

Phys. Rev. E

View full text Add to dashboard Cite

show abstract

“…In this paper we will research memory properties for these kinds of input signals. Past research has focused on onedimensional signals to quite some detail [9], [10], [11], [12]. The main conclusions found in this area of study are the following.…”

Section: A Memory In Recurrent Networkmentioning

confidence: 84%

Memory in reservoirs for high dimensional input

Hermans

Schrauwen

2010

The 2010 International Joint Conference on Neural Networks (IJCNN)

Self Cite

View full text Add to dashboard Cite

Abstract-Reservoir Computing (RC) is a recently introduced scheme to employ recurrent neural networks while circumventing the difficulties that typically appear when training the recurrent weights. The 'reservoir' is a fixed randomly initiated recurrent network which receives input via a random mapping. Only an instantaneous linear mapping from the network to the output is trained which can be done with linear regression. In this paper we study dynamical properties of reservoirs receiving a high number of inputs. More specifically, we investigate how the internal state of the network retains fading memory of its input signal. Memory properties for random recurrent networks have been thoroughly examined in past research, but only for one-dimensional input. Here we take into account statistics which will typically occur in high dimensional signals. We find useful empirical data which expresses how memory in recurrent networks is distributed over the individual principal components of the input.

show abstract

“…Therefore, the type of memory used in LR and ELM does not add dynamics, but merely serves as a noise filter, opposed to the dynamic properties in a reservoir. Consequently, RC has memory capacity [16] as RC systems can learn relations with the past, while ELM has no memory capacity due to the lack of recurrent connections between the hidden-layer nodes.…”

Section: Reservoir Computingmentioning

confidence: 99%

Terrain Classification for a Quadruped Robot

Degrave

Cauwenbergh²,

wyffels

et al. 2013

2013 12th International Conference on Machine Learning and Applications

Self Cite

View full text Add to dashboard Cite

Abstract-Using data retrieved from the Puppy II robot at the University of Zurich (UZH), we show that machine learning techniques with non-linearities and fading memory are effective for terrain classification, both supervised and unsupervised, even with a limited selection of input sensors. The results indicate that most information for terrain classification is found in the combination of tactile sensors and proprioceptive joint angle sensors. The classification error is small enough to have a robot adapt the gait to the terrain and hence move more robustly.

show abstract

Memory in linear recurrent neural networks in continuous time

Cited by 96 publications

References 20 publications

Difference between memory and prediction in linear recurrent networks

Difference between memory and prediction in linear recurrent networks

Memory in reservoirs for high dimensional input

Terrain Classification for a Quadruped Robot

Contact Info

Product

Resources

About