Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Cone, Ian; Clopath, Claudia; Shouval, Harel Z

doi:10.1101/2022.04.06.487298

Cited by 3 publications

(4 citation statements)

References 90 publications

(155 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These mechanisms may work in tandem with Hebbian plasticity to construct cognitive maps and/or they may be more involved in refining behavior policy and other task-specific functions by selectively routing information from the established cognitive maps to other brain regions mediating behavioral policies. Our data indicate that task representations and behavioral policies based upon them are formed in lockstep, as suggested by previous theory 75 . A likely candidate mechanism for the contribution of synaptic plasticity during feedback is behavioral time scale synaptic plasticity (BTSP) 76 .…”

Section: Discussionsupporting

confidence: 85%

Learning produces a hippocampal cognitive map in the form of an orthogonalized state machine

Sun,

Winnubst,

Natrajan

et al. 2023

Preprint

View full text Add to dashboard Cite

Cognitive maps confer animals with flexible intelligence by representing spatial, temporal, and abstract relationships that can be used to shape thought, planning, and behavior. Cognitive maps have been observed in the hippocampus, but their algorithmic form and the processes by which they are learned remain obscure. Here, we employed large-scale, longitudinal two-photon calcium imaging to record activity from thousands of neurons in the CA1 region of the hippocampus while mice learned to efficiently collect rewards from two subtly different versions of linear tracks in virtual reality. The results provide a detailed view of the formation of a cognitive map in the hippocampus. Throughout learning, both the animal behavior and hippocampal neural activity progressed through multiple intermediate stages, gradually revealing improved task understanding and behavioral efficiency. The learning process led to progressive decorrelations in initially similar hippocampal neural activity within and across tracks, ultimately resulting in orthogonalized representations resembling a state machine capturing the inherent structure of the task. We show that a Hidden Markov Model (HMM) and a biologically plausible recurrent neural network trained using Hebbian learning can both capture core aspects of the learning dynamics and the orthogonalized representational structure in neural activity. In contrast, we show that gradient-based learning of sequence models such as Long Short-Term Memory networks (LSTMs) and Transformers do not naturally produce such representations. We further demonstrate that mice exhibited adaptive behavior in novel task settings, with neural activity reflecting flexible deployment of the state machine. These findings shed light on the mathematical form of cognitive maps, the learning rules that sculpt them, and the algorithms that promote adaptive behavior in animals. The work thus charts a course toward a deeper understanding of biological intelligence and offers insights toward developing more robust learning algorithms in artificial intelligence.

show abstract

Section: Discussionsupporting

confidence: 85%

Learning produces a hippocampal cognitive map in the form of an orthogonalized state machine

Sun,

Winnubst,

Natrajan

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…It was only for outcome signals recorded in lateral sites that we could detect systematic changes related to changing probabilities of reward. Despite these uncertainties, our observations introduce evidence for dopamine plateau responses as learning-related features to add to transient and ramping responses formerly reported, and raise new questions about RPE encoding by the striatum during learning 39 .…”

Section: Discussionsupporting

confidence: 58%

“…For example, some cholinergic inputs, some likely from these interneurons, generate action potentials in intrastriatal dopamine fibers far from their cell bodies 16 . Further, oscillatory local field potentials can accompany and even modulate activity 39,[51][52][53] . We did not monitor this activity.…”

Section: Discussionmentioning

confidence: 99%

“…The dopamine release signals can be principally related to negative as well as positive reinforcement 5,[27][28][29] or to non-reward parameters of movement 21,30,31 , can occur as prolonged ramping signals 32 , and can be compartmentally selective for striosome and matrix compartments of the striatum 18,[33][34][35][36] . Especially for the nucleus accumbens, but also for the dorsal striatum, the relation of the release patterns to RPE-TD learning algorithms has been strongly questioned [37][38][39] and strongly defended 10,23,30,37,[39][40][41][42][43][44][45][46][47] .…”

mentioning

confidence: 99%

See 1 more Smart Citation

Dopamine Release Plateau and Outcome Signals in Dorsal Striatum Contrast with Classic Reinforcement Learning Formulations

Kim,

Gibson,

et al. 2023

Preprint

View full text Add to dashboard Cite

We recorded dopamine release signals in medial and lateral sectors of the striatum as mice learned consecutive visual cue-outcome conditioning tasks including cue association, cue discrimination, reversal, and probabilistic discrimination task versions. Dopamine release responses in medial and lateral sites exhibited learning-related changes within and across phases of acquisition. These were different for the medial and lateral sites. In neither sector could these be accounted for by classic reinforcement learning as applied to dopamine-containing neuron activity. Cue responses ranged from initial sharp peaks to modulated plateau responses. In the medial sector, outcome (reward) responses during cue conditioning were minimal or, initially, negative. By contrast, in lateral sites, strong, transient dopamine release responses occurred at both cue and outcome. Prolonged, plateau release responses to cues emerged in both regions when discriminative behavioral responses became required. In most sites, we found no evidence for a transition from outcome to cue signaling, a hallmark of temporal difference reinforcement learning as applied to midbrain dopamine activity. These findings delineate reshaping of dopamine release activity during learning and suggest that current views of reward prediction error encoding need review to accommodate distinct learning-related spatial and temporal patterns of striatal dopamine release in the dorsal striatum.

show abstract

Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations

Kim,

Gibson,

et al. 2024

Nat Commun

View full text Add to dashboard Cite

We recorded dopamine release signals in centromedial and centrolateral sectors of the striatum as mice learned consecutive versions of visual cue-outcome conditioning tasks. Dopamine release responses differed for the centromedial and centrolateral sites. In neither sector could these be accounted for by classic reinforcement learning alone as classically applied to the activity of nigral dopamine-containing neurons. Medially, cue responses ranged from initial sharp peaks to modulated plateau responses; outcome (reward) responses during cue conditioning were minimal or, initially, negative. At centrolateral sites, by contrast, strong, transient dopamine release responses occurred at both cue and outcome. Prolonged, plateau release responses to cues emerged in both regions when discriminative behavioral responses became required. At most sites, we found no evidence for a transition from outcome signaling to cue signaling, a hallmark of temporal difference reinforcement learning as applied to midbrain dopaminergic neuronal activity. These findings delineate a reshaping of striatal dopamine release activity during learning and suggest that current views of reward prediction error encoding need review to accommodate distinct learning-related spatial and temporal patterns of striatal dopamine release in the dorsal striatum.

show abstract

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Cited by 3 publications

References 90 publications

Learning produces a hippocampal cognitive map in the form of an orthogonalized state machine

Learning produces a hippocampal cognitive map in the form of an orthogonalized state machine

Dopamine Release Plateau and Outcome Signals in Dorsal Striatum Contrast with Classic Reinforcement Learning Formulations

Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations

Contact Info

Product

Resources

About