SUMMARY We investigated how different sub-regions of rodent prefrontal cortex contribute to value-based decision making, by comparing neural signals related to animal’s choice, its outcome, and action value in orbitofrontal cortex (OFC) and medial prefrontal cortex (mPFC) of rats performing a dynamic two-armed bandit task. Neural signals for upcoming action selection arose in the mPFC, including the anterior cingulate cortex, only immediately before the behavioral manifestation of animal’s choice, suggesting that rodent prefrontal cortex is not involved in advanced action planning. Both OFC and mPFC conveyed signals related to the animal’s past choices and their outcomes over multiple trials, but neural signals for chosen value and reward prediction error were more prevalent in the OFC. Our results suggest that rodent OFC and mPFC serve distinct roles in value-based decision making, and that the OFC plays a prominent role in updating the values of outcomes expected from chosen actions.
The striatum is thought to play a crucial role in value-based decision making. Although a large body of evidence suggests its involvement in action selection as well as action evaluation, underlying neural processes for these functions of the striatum are largely unknown. To obtain insights on this matter, we simultaneously recorded neuronal activity in the dorsal and ventral striatum of rats performing a dynamic two-armed bandit task, and examined temporal profiles of neural signals related to animal's choice, its outcome, and action value. Whereas significant neural signals for action value were found in both structures before animal's choice of action, signals related to the upcoming choice were relatively weak and began to emerge only in the dorsal striatum ϳ200 ms before the behavioral manifestation of the animal's choice. In contrast, once the animal revealed its choice, signals related to choice and its value increased steeply and persisted until the outcome of animal's choice was revealed, so that some neurons in both structures concurrently conveyed signals related to animal's choice, its outcome, and the value of chosen action. Thus, all the components necessary for updating values of chosen actions were available in the striatum. These results suggest that the striatum not only represents values associated with potential choices before animal's choice of action, but might also update the value of chosen action once its outcome is revealed. In contrast, action selection might take place elsewhere or in the dorsal striatum only immediately before its behavioral manifestation.
Despite widespread neural activity related to reward values, signals related to upcoming choice have not been clearly identified in the rodent brain. Here, we examined neuronal activity in the lateral (AGl) and medial (AGm) agranular cortex, corresponding to the primary and secondary motor cortex, respectively, in rats performing a dynamic foraging task. Choice signals arose in the AGm before behavioral manifestation of the animal’s choice earlier than in any other areas of the rat brain previously studied under free-choice conditions. The AGm also conveyed significant neural signals for decision value and chosen value. In contrast, upcoming choice signals arose later and value signals were weaker in the AGl. We also found that AGm lesions made the animal’s choices less dependent on dynamically updated values. These results suggest that rodent secondary motor cortex might be uniquely involved in both representing and reading out value signals for flexible action selection.
Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.Animals must continually update their behavioral strategies according to changes in an environment in order to optimize their choices. Reinforcement learning (RL) models (Sutton and Barto 1998) provide a powerful theoretical framework for understanding choice behavior in humans and animals in a dynamic environment. In theories of RL, future actions are chosen so as to maximize a long-term sum of positive outcomes, and this can be accomplished by a set of value functions that represent the amount of expected reward that is associated with particular states or actions. The value functions are continually updated based on the reward prediction error, which is the difference between the expected and actual rewards. This way, even without prior knowledge about an uncertain and dynamically changing environment, an animal can discover the structure of the environment that can be exploited for optimal choice by trial-and-error. Not surprisingly, human and monkey choice behaviors in various tasks are well described by reinforcement learning algorithms (e.g., O'Doherty et al. 2003;Barraclough et al. 2004;Lee et al. 2004;Samejima et al. 2005;Daw et al. 2006;Pessiglione et al. 2006).The updating of value functions can be achieved in two fundamentally different ways. In simple or direct RL algorithms, value functions are updated only by trial-and-error. In other words, only the value function that is associated with the chosen action is updated, and those that are associated with uncommitted actions remain unchanged. On the other hand, in indirect or model-based RL algorithms, the value functions might also change according to the decis...
Studies in rats, monkeys, and humans have found action-value signals in multiple regions of the brain. These findings suggest that action-value signals encoded in these brain structures bias choices toward higher expected rewards. However, previous estimates of action-value signals might have been inflated by serial correlations in neural activity and also by activity related to other decision variables. Here, we applied several statistical tests based on permutation and surrogate data to analyze neural activity recorded from the striatum, frontal cortex, and hippocampus. The results show that previously identified action-value signals in these brain areas cannot be entirely accounted for by concurrent serial correlations in neural activity and action value. We also found that neural activity related to action value is intermixed with signals related to other decision variables. Our findings provide strong evidence for broadly distributed neural signals related to action value throughout the brain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.