This paper considers prediction and perceptual categorization as an inference problem that is solved by the brain. We assume that the brain models the world as a hierarchy or cascade of dynamical systems that encode causal structure in the sensorium. Perception is equated with the optimization or inversion of these internal models, to explain sensory data. Given a model of how sensory data are generated, we can invoke a generic approach to model inversion, based on a free energy bound on the model's evidence. The ensuing free-energy formulation furnishes equations that prescribe the process of recognition, i.e. the dynamics of neuronal activity that represent the causes of sensory input. Here, we focus on a very general model, whose hierarchical and dynamical structure enables simulated brains to recognize and predict trajectories or sequences of sensory states. We first review hierarchical dynamical models and their inversion. We then show that the brain has the necessary infrastructure to implement this inversion and illustrate this point using synthetic birds that can recognize and categorize birdsongs.Keywords: generative models; predictive coding; hierarchical; birdsong
INTRODUCTIONThis paper reviews generic models of our sensorium and a Bayesian scheme for their inversion. We then show that the brain has the necessary anatomical and physiological equipment to invert these models, given sensory data. Critically, the scheme lends itself to a relatively simple neural network implementation that shares many features with real cortical hierarchies in the brain. The basic idea that the brain tries to infer the causes of sensations dates back to Helmholtz (e.g. Helmholtz 1860/1962Barlow 1961;Neisser 1967;Ballard et al. 1983;Mumford 1992;Kawato et al. 1993;Dayan et al. 1995;Rao & Ballard 1998), with a recent emphasis on hierarchical inference and empirical Bayes (Friston 2003(Friston , 2005Friston et al. 2006). Here, we generalize this idea to cover dynamics in the world and consider how neural networks could be configured to invert hierarchical dynamical models and deconvolve sensory causes from sensory input. This paper comprises four sections. In §1, we introduce hierarchical dynamical models and their inversion. These models cover most of the models encountered in the statistical literature. An important aspect of these models is their formulation in generalized coordinates of motion, which lends them a hierarchal form in both structure and dynamics. These hierarchies induce empirical priors that provide structural and dynamical constraints, which can be exploited during inversion. In §2, we show how inversion can be formulated as a simple gradient ascent using neuronal networks; in §3, we consider how evoked brain responses might be understood in terms of inference under hierarchical dynamical models of sensory input. 1