The asymmetric learning rates of murine exploratory behavior in sparse reward environments

Ohta, Hiroyuki; Satori, Kuniaki; Takarada, Yu; Arake, Masashi; Ishizuka, Toshiaki; Morimoto, Yuji; Takahashi, Tatsuji

doi:10.1016/j.neunet.2021.05.030

Cited by 19 publications

(26 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) The Simple Q-learning model has single state value (hereafter, “SimpleQ”). (2) The Asymmetry model has independent learning rates for positive and negative reward prediction errors (Katahira et al, 2017b; Lefebvre et al, 2017; Ohta et al, 2021). (3) The Perseverance model has a choice auto- correlation to incorporate perseverance in action selection (Katahira, 2018; Lau and Glimcher, 2005).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A Reinforcement Learning Model with Choice Traces for a Progressive Ratio Schedule

Ihara,

Shikano,

Kato

et al. 2023

Preprint

View full text Add to dashboard Cite

The progressive ratio (PR) lever-press task serves as a benchmark for assessing goal-oriented motivation. However, a well-recognized limitation of the PR task is that only a single data point, known as the breakpoint, is obtained from an entire session as a barometer of motivation. Because the breakpoint is defined as the final ratio of responses achieved in a PR session, variations in choice behavior during the PR task cannot be captured. We addressed this limitation by constructing four reinforcement learning models: a Simple Q- learning model, an Asymmetric model with two learning rates, a Perseverance model with choice traces, and a Perseverance model without learning. These models incorporated three behavioral choices: reinforced and non-reinforced lever presses and void magazine nosepokes (MNPs), because we noticed that mice performed frequent MNPs during PR tasks. The best model was the Perseverance model, which predicted a gradual reduction in amplitudes of reward prediction errors (RPEs) upon void MNPs. We confirmed the prediction experimentally with fiber photometry of extracellular dopamine (DA) dynamics in the ventral striatum of mice using a fluorescent protein (genetically encoded GPCR activation-based DA sensor: GRABDA2m). We verified application of the model by acute intraperitoneal injection of low-dose methamphetamine (METH) before a PR task, which increased the frequency of MNPs during the PR session without changing the breakpoint. The Perseverance model captured behavioral modulation as a result of increased initial action values, which are customarily set to zero and disregarded in reinforcement learning analysis. Our findings suggest that the Perseverance model reveals effects of psychoactive drugs on choice behaviors during PR tasks.

show abstract

Section: Methodsmentioning

confidence: 99%

“…( 1) The Simple Q-learning model has single state value (hereafter, "SimpleQ"). ( 2) The Asymmetry model has independent learning rates for positive and negative reward prediction errors (Katahira et al, 2017b;Lefebvre et al, 2017;Ohta et al, 2021). ( 3)…”

Section: Computational Modelsmentioning

confidence: 99%

A Reinforcement Learning Model with Choice Traces for a Progressive Ratio Schedule

Ihara,

Shikano,

Kato

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In Appendix 5, as an extension of the actor-critic model, we consider the asymmetries in learning that have been considered mainly in action value-based models in the previous literature (Frank et al 2007;Niv et al 2012;Gershman 2015;Lefebvre et al 2017;Ohta et al 2021).…”

Section: Mapping Actor-critic Learning To Q-learningmentioning

confidence: 99%

“…In this Appendix, we consider an extension of the actorcritic model and discuss its statistical properties. Specifically, we consider the asymmetries in learning that have been considered mainly in action value-based models in the previous literature (Frank et al 2007;Niv et al 2012;Gershman 2015;Lefebvre et al 2017;Ohta et al 2021). In actor-critic learning, two types of asymmetry, namely, asymmetries in the critic (state value update) and the actor (policy update), can be considered, although such models have not yet been used for model fitting to behavioral data.…”

Section: Appendix 5 Asymmetric Learning In Actor-critic Learningmentioning

confidence: 99%

Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning

Katahira

Kimura

2022

Comput Brain Behav

View full text Add to dashboard Cite

Reinforcement learning models have been used in many studies in the fields of neuroscience and psychology to model choice behavior and underlying computational processes. Models based on action values, which represent the expected reward from actions (e.g., Q-learning model), have been commonly used for this purpose. Meanwhile, the actor-critic learning model, in which the policy update and evaluation of an expected reward for a given state are performed in separate systems (actor and critic, respectively), has attracted attention due to its ability to explain the characteristics of various behaviors of living systems. However, the statistical property of the model behavior (i.e., how the choice depends on past rewards and choices) remains elusive. In this study, we examine the history dependence of the actor-critic model based on theoretical considerations and numerical simulations while considering the similarities with and differences from Q-learning models. We show that in actor-critic learning, a specific interaction between past reward and choice, which differs from Q-learning, influences the current choice. We also show that actor-critic learning predicts qualitatively different behavior from Q-learning, as the higher the expectation is, the less likely the behavior will be chosen afterwards. This study provides useful information for inferring computational and psychological principles from behavior by clarifying how actor-critic learning manifests in choice behavior.

show abstract

“…In animal literature, the Anterior Cingulate Cortex (ACC) has been identified as a major modulator of explore-exploit decisions. Versions of the n-armed bandits have been fitted for rats and mice with the use of n-armed radial mazes (Ohta et al 2021). Anterior Cingulate Cortex (ACC) activation has been linked to foraging in rats in an adapted patch foraging task (Kane et al 2022) and a two-armed bandit monkey lesion study (Kennerley et al 2006).…”

Section: Introductionmentioning

confidence: 99%

Meta-Analysis Reveals That Explore-Exploit Decisions are Dissociable by Activation in the Dorsal Lateral Prefrontal Cortex, Anterior Insula, and the Anterior Cingulate Cortex

Sazhin,

Dachs,

Smith

2023

Preprint

View full text Add to dashboard Cite

Explore-exploit research has challenges in generalizability due to a limited theoretical basis of exploration and exploitation. Neuroimaging can help identify whether explore-exploit decisions use an opponent processing system to address this issue. Thus, we conducted a coordinate-based meta-analysis (N=23 studies) where we found activation in the dorsal lateral prefrontal cortex and anterior cingulate cortex during exploration versus exploitation, providing some evidence for opponent processing. However, the conjunction of explore-exploit decisions was associated with activation in the dorsal anterior cingulate cortex, dorsal medial prefrontal cortex, and anterior insula, suggesting that these brain regions do not engage in opponent processing. Further, exploratory analyses revealed heterogeneity in brain responses between task types during exploration and exploitation respectively. Coupled with results suggesting that activation in exploration and exploitation decisions is generally more similar than it is different suggests there remain significant challenges toward characterizing explore-exploit decision making. Nonetheless, dlPFC and ACC activation differentiate explore and exploit decisions and identifying these responses can help in targeted interventions aimed at manipulating these decisions.

show abstract

The asymmetric learning rates of murine exploratory behavior in sparse reward environments

Cited by 19 publications

References 39 publications

A Reinforcement Learning Model with Choice Traces for a Progressive Ratio Schedule

A Reinforcement Learning Model with Choice Traces for a Progressive Ratio Schedule

Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning

Meta-Analysis Reveals That Explore-Exploit Decisions are Dissociable by Activation in the Dorsal Lateral Prefrontal Cortex, Anterior Insula, and the Anterior Cingulate Cortex

Contact Info

Product

Resources

About