2022
DOI: 10.48550/arxiv.2207.00636
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Action-modulated midbrain dopamine activity arises from distributed control policies

Abstract: Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms.In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 29 publications
0
4
0
Order By: Relevance
“…Second, DLS dopamine may represent the output of a circuit that evaluates the content of spontaneous behaviour. Although dopamine has classically been thought to report reward-prediction errors-which by definition require the provision of reward-it has recently been argued that dopamine may also encode action-prediction errors 47,48 (APEs). APEs are proposed to occur as animals either execute or plan to execute a behaviour that is unexpected in a given context; in the setting of spontaneous behaviour, an APE-like model would predict that DLS dopamine represents the comparison between the expressed (or soon-to-be-expressed) behavioural syllable and that which would have been expressed at a particular moment given an idealized transition matrix.…”
Section: Discussionmentioning
confidence: 99%
“…Second, DLS dopamine may represent the output of a circuit that evaluates the content of spontaneous behaviour. Although dopamine has classically been thought to report reward-prediction errors-which by definition require the provision of reward-it has recently been argued that dopamine may also encode action-prediction errors 47,48 (APEs). APEs are proposed to occur as animals either execute or plan to execute a behaviour that is unexpected in a given context; in the setting of spontaneous behaviour, an APE-like model would predict that DLS dopamine represents the comparison between the expressed (or soon-to-be-expressed) behavioural syllable and that which would have been expressed at a particular moment given an idealized transition matrix.…”
Section: Discussionmentioning
confidence: 99%
“…In theory for APE to function as a teaching signal it need only reflect scalar information about action and the degree to which actions are predicted (Bogacz, 2020; Lindsey and Litwin-Kumar, 2022; Miller et al, 2019). This is to say the APE term need not be action-specific.…”
Section: Discussionmentioning
confidence: 99%
“…Recent theoretical work has also predicted that if dopamine neurons encoded an APE, then it would allow the basal ganglia to learn from any area of the brain that controls action (Lindsey and Litwin-Kumar, 2022). In our model the value-free system learns to mimic the actions that were initially driven by the value-based basal ganglia system.…”
Section: Discussionmentioning
confidence: 99%
“…Importantly, the action being reinforced is the output of the whole system, which reflects the contributions of both the MC and DLS modules. This kind of learning rule, known in the machine-learning literature as an 'off-policy' reinforcement learning algorithm, incentivizes subcomponents of a larger system (here, the DLS module vis à vis the entire model) to assume autonomous control of behavior when possible 51 . Such an objective is a plausible mechanism for encouraging subcortical consolidation in the motor system and is supported by experimental evidence 8,27,[52][53][54] .…”
Section: A Neural Network Model Explains the Mechanisms Of Subcortica...mentioning
confidence: 99%