2019
DOI: 10.1038/s41598-019-43245-z
|View full text |Cite
|
Sign up to set email alerts
|

Dopamine blockade impairs the exploration-exploitation trade-off in rats

Abstract: In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
100
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 77 publications
(105 citation statements)
references
References 65 publications
4
100
1
Order By: Relevance
“…For example, several studies have shown that mice with chronically elevated tonic DA levels were more motivated to work for a food rewards, without showing improvements in Pavlovian or operant learning compared to wild type mice 49-51 . These experimental results are consistent with computational modelling studies that found that genetic or simulated differences in tonic DA levels uniquely correlated with explore-exploit tendencies, but not with learning rates 51-54 . Also in humans, some effects of dopaminergic medication on reward and punishment learning in PD patients can be explained by motivational differences at the time of choice, rather than by differences in feedback lea rning 55-57 .…”
Section: Discussionsupporting
confidence: 87%
“…For example, several studies have shown that mice with chronically elevated tonic DA levels were more motivated to work for a food rewards, without showing improvements in Pavlovian or operant learning compared to wild type mice 49-51 . These experimental results are consistent with computational modelling studies that found that genetic or simulated differences in tonic DA levels uniquely correlated with explore-exploit tendencies, but not with learning rates 51-54 . Also in humans, some effects of dopaminergic medication on reward and punishment learning in PD patients can be explained by motivational differences at the time of choice, rather than by differences in feedback lea rning 55-57 .…”
Section: Discussionsupporting
confidence: 87%
“…Morita and Kato (2014), on the other hand, posited that value updating involves a decay term. Assuming such a decay term results in a relationship qualitatively similar to that in Equation (10), and thus RPE ramping (see also implementations in Mikhael and Bogacz, 2016; Cinotti et al, 2019). Ramping can similarly be explained by assuming temporal or spatial bias that decreases with approach to the reward, by modulating the temporal discount term during task execution, or by other mechanisms (see Supplemental Information for derivations).…”
Section: Discussionmentioning
confidence: 92%
“…Indeed, previous work has suggested that DA controls the explorationexploitation trade-off, whereby high DA encourages exploiting the option with the highest reward, and low DA encourages exploring other options [28][29][30] (but see [31] and Discussion). For instance, Cinotti et al [30] trained rats on a non-stationary multi-armed bandit task with varying levels of DA blockade. The authors observed that the degree of win-shift behavior, representing the drive to explore rather than to exploit, increased with higher doses of the DA antagonist flupenthixol (Fig.…”
Section: Relationship With Experimental Datamentioning
confidence: 99%
“…On the other hand, when this learned information must subsequently be used to select actions, DA seems to control the exploration-exploitation trade-off: Here, high DA promotes exploitation of actions with higher learned value and increases motivation. Low DA, on the other hand, promotes exploration [28][29][30] (but see [31,32] and Discussion) and decreases motivation [33][34][35]. Recently developed computational models allow DA to achieve both learning and performance roles by endowing it with separate computational machinery for each [27,36].…”
Section: Introductionmentioning
confidence: 99%