The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman primates to effortlessly recognize objects across a multitude of viewing conditions. Prior studies have augmented feedforward convolutional neural networks (CNNs) with recurrent connections to study their role in visual processing; however, often these recurrent networks are optimized directly on neural data or the comparative metrics used are undefined for standard feedforward networks that lack these connections. In this work, we develop task-optimized convolutional recurrent (ConvRNN) network models that more correctly mimic the timing and gross neuroanatomy of the ventral pathway. Properly chosen intermediate-depth ConvRNN circuit architectures, which incorporate mechanisms of feedforward bypassing and recurrent gating, can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then develop methods that allow us to compare both CNNs and ConvRNNs to finely grained measurements of primate categorization behavior and neural response trajectories across thousands of stimuli. We find that high-performing ConvRNNs provide a better match to these data than feedforward networks of any depth, predicting the precise timings at which each stimulus is behaviorally decoded from neural activation patterns. Moreover, these ConvRNN circuits consistently produce quantitatively accurate predictions of neural dynamics from V4 and IT across the entire stimulus presentation. In fact, we find that the highest-performing ConvRNNs, which best match neural and behavioral data, also achieve a strong Pareto trade-off between task performance and overall network size. Taken together, our results suggest the functional purpose of recurrence in the ventral pathway is to fit a high-performing network in cortex, attaining computational power through temporal rather than spatial complexity.
The ventral visual stream (VVS) is a hierarchically connected series of cortical areas known to underlie core object recognition behaviors, enabling humans and non-human primates to effortlessly recognize objects across a multitude of viewing conditions. While recent feedforward convolutional neural networks (CNNs) provide quantitatively accurate predictions of temporally-averaged neural responses throughout the ventral pathway, they lack two ubiquitous neuroanatomical features: local recurrence within cortical areas and long-range feedback from downstream areas to upstream areas. As a result, such models are unable to account for the temporally-varying dynamical patterns thought to arise from recurrent visual circuits, nor can they provide insight into the behavioral goals that these recurrent circuits might help support. In this work, we augment CNNs with local recurrence and long-range feedback, developing convolutional RNN (ConvRNN) network models that more correctly mimic the gross neuroanatomy of the ventral pathway. Moreover, when the form of the recurrent circuit is chosen properly, ConvRNNs with comparatively small numbers of layers can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then compared these models to temporally fine-grained neural and behavioral recordings from primates to thousands of images. We found that ConvRNNs better matched these data than alternative models, including the deepest feedforward networks, on two metrics: 1) neural dynamics in V4 and inferotemporal (IT) cortex at late timepoints after stimulus onset, and 2) the varying times at which object identity can be decoded from IT, including more challenging images that take longer to decode. Moreover, these results differentiate within the class of ConvRNNs, suggesting that there are strong functional constraints on the recurrent connectivity needed to match these phenomena. Finally, we find that recurrent circuits that attain high task performance while having a smaller network size as measured by number of units, rather than another metric such as the number of parameters, are overall most consistent with these data. Taken together, our results evince the role of recurrence and feedback in the ventral pathway to reliably perform core object recognition while subject to a strong total network size constraint.
In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). We find empirically that long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction between the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents, which cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.