Recurrent networks are trained to memorize their input better, often in the hopes that such training will increase the ability of the network to predict. We show that networks designed to memorize input can be arbitrarily bad at prediction. We also find, for several types of inputs, that one-node networks optimized for prediction are nearly at upper bounds on predictive capacity given by Wiener filters, and are roughly equivalent in performance to randomly generated five-node networks. Our results suggest that maximizing memory capacity leads to very different networks than maximizing predictive capacity, and that optimizing recurrent weights can decrease reservoir size by half an order of magnitude. Often, we remember for the sake of prediction. Such is the case, it seems, in the field of echo state networks (ESNs) [1,2]. ESNs are large input-dependent recurrent networks in which a "readout layer" is trained to match a desired output signal from the present network state. Sometimes, the desired output signal is the past or future of the input to the network.If the recurrent networks are large enough, they should have enough information about the past of the input signal to reproduce a past input or predict a future input well, and only the readout layer need be trained. Still, the weights and structure of the recurrent network can greatly affect the predictive capabilities of the recurrent network, and so many researchers are now interested in optimizing the network itself to maximize task performance [3].Much of the theory surrounding echo state networks centers on memorizing white noise, an input for which memory is essentially useless for prediction [4]. This leads to a rather practical question: how much of the theory surrounding optimal reservoirs, based on maximizing memory capacity [5][6][7][8][9], is misleading if the ultimate goal is to maximize predictive power?We study the difference between optimizing for memory and optimizing for prediction in linear recurrent networks subject to scalar temporally-correlated input generated by countable Hidden Markov models. Ref.[10] gave closed-form expressions for memory function of continuous-time linear recurrent networks in terms of the autocorrelation function of the input, and closely studied the case of an exponential autocorrelation function. * semarzen@mit.edu Ref.[11] gave similar expressions for discrete-time linear recurrent networks. Ref.[12] gave closed-form expressions for the Fisher memory curve of discrete-time linear recurrent networks, which measure how much changes in input signal perturb the network state; for linear recurrent networks, this curve is independent of the particular input signal.We differ from these previous efforts mostly in that we study both memory capacity and newly-defined "predictive capacity". We derive an upper bound for predictive capacity via Wiener filters in terms of the autocorrelation function of the input. Two surprising findings result. First, predictive capacity is not typically maximized at the "edge of critical...