2022
DOI: 10.1101/2022.07.18.500429
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributional coding of associative learning within projection-defined populations of midbrain dopamine neurons

Abstract: Midbrain dopamine neurons are thought to play key roles in learning by conveying the difference between expected and actual outcomes. While this teaching signal is often considered to be uniform, recent evidence instead supports diversity in dopamine signaling. However, it remains poorly understood how heterogeneous signals might be organized to facilitate the role of downstream circuits mediating distinct aspects of behavior. Here we investigated the organizational logic of dopaminergic signaling by recording… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 61 publications
1
5
0
Order By: Relevance
“…Formally investigating this prediction (e.g., by systematically comparing cue and outcome responding across different projection-defined DA populations) is the key open empirical test of our framework. However, existing data are generally consistent with the hypothesis that the sorts of heterogeneity associated with outcome-specific differences tend to arise between distinct DA nuclei or target regions 7,90,91 , in contrast to the hallmarks of feature-specific PEs we study here, which occur within VTA 4,20 and also within SNc 5 . Thus, in SNc and DMS, compared to VTA and NAc, movement responses tend to emerge while reward responses decline (a combination consistent with outcome-rather than feature-specific variation; Fig.…”
Section: Discussionsupporting
confidence: 90%
“…Formally investigating this prediction (e.g., by systematically comparing cue and outcome responding across different projection-defined DA populations) is the key open empirical test of our framework. However, existing data are generally consistent with the hypothesis that the sorts of heterogeneity associated with outcome-specific differences tend to arise between distinct DA nuclei or target regions 7,90,91 , in contrast to the hallmarks of feature-specific PEs we study here, which occur within VTA 4,20 and also within SNc 5 . Thus, in SNc and DMS, compared to VTA and NAc, movement responses tend to emerge while reward responses decline (a combination consistent with outcome-rather than feature-specific variation; Fig.…”
Section: Discussionsupporting
confidence: 90%
“…A vector-valued error would likely appear as tunings to various motor and task variables in experimental animals (Requirement 2 ), especially in the phase before the animals are so overtrained their error is zero. Indeed, cells in the SNr (Fan et al, 2012;Barter et al, 2015;Tang et al, 2021) as well as the SNc (Howe and Dombeck, 2016;Dodson et al, 2016;Coddington and Dudman, 2018;Avvisati et al, 2022) respond to a plethora of behavioral and task variables.…”
Section: Discussionmentioning
confidence: 99%
“…A vector-valued error would likely appear as tunings to various motor and task variables in experimental animals ( Requirement 2 ), especially in the phase before the animals are so overtrained their error is zero. Indeed, cells in the SNr (Fan et al, 2012; Barter et al, 2015; Tang et al, 2021) as well as the SNc (Howe and Dombeck, 2016; Dodson et al, 2016; Coddington and Dudman, 2018; Avvisati et al, 2022) respond to a plethora of behavioral and task variables. We deliberately excluded the details of how this error may be computed in the brain, but we speculate at least three possible algorithmic ways in which it could appear: Bran region regions such as motor cortex or cerebellum could have a forward model of the world as well as the target, and thus, can directly compute the error and send it to the midbrain. The brain could be wired as a set of hierarchical control loops, in which each loop provides the target for the level below (as proposed by Yin, 2014).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, TD algorithms have been elaborated to include a set of value learning channels that differ in their sensitivity to positive and negative RPEs, leading to value estimates that converge to distinct statistics of the expected cumulative reward distribution, an innovation termed distributional RL 6 (Figure 1B). Such innovations have been shown to improve performance of deep RL agents on benchmark tasks due to improved statistical robustness, and evidence of distributional RL-like computations has been reported in midbrain DANs of both mice and primates 7,8,9 , however the direct functional relevance of such distributional mechanisms and representations to behavior is unknown. In the engineering setting, deep RL agents vary in whether and how they make use of knowledge about the distribution over future rewards when selecting actions [10][11][12] , and decoding of reward distributions from DAN activity has only been demonstrated at the time of reward delivery 7 .…”
Section: Introductionmentioning
confidence: 99%