Highlights d Cortex-wide task-epoch-specific causal neural activity across sensorimotor learning d Rapid inhibitory response of orofacial cortex contributes to delay licking d Secondary whisker motor cortex is a key node converting whisker input to lick plan d Sensory information converges to a focal frontal area critical for delayed response
Surprise-based learning allows agents to rapidly adapt to nonstationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call the Bayes Factor Surprise, that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorithms, the Bayes Factor Surprise modulates the rate of adaptation to new observations. We derive three novel surprise-based algorithms, one in the family of particle filters, one in the family of variational learning, and one in the family of message passing, that have constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these surprise-based algorithms estimate parameters better than alternative approximate approaches and reach levels of performance comparable to computationally more expensive algorithms. The Bayes Factor Surprise is related to but different from the Shannon Surprise. In two hypothetical experiments, we make testable predictions for physiological indicators that dissociate the Bayes Factor Surprise from the Shannon Surprise. The theoretical insight of casting various approaches as surprise-based learning, as well as the proposed online algorithms, may be applied to the analysis of animal and human behavior and to reinforcement learning in nonstationary environments.
Drivers of reinforcement learning (RL), beyond reward, are controversially debated. Novelty and surprise are often used equivocally in this debate. Here, using a deep sequential decision-making paradigm, we show that reward, novelty, and surprise play different roles in human RL. Surprise controls the rate of learning, whereas novelty and the novelty prediction error (NPE) drive exploration. Exploitation is dominated by model-free (habitual) action choices. A theory that takes these separate effects into account predicts on average 73 percent of the action choices of human participants after the first encounter of a reward and allows us to dissociate surprise and novelty in the EEG signal. While the event-related potential (ERP) at around 300ms is positively correlated with surprise, novelty, NPE, reward, and the reward prediction error, the ERP response to novelty and NPE starts earlier than that to surprise.
Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.