2021
DOI: 10.1016/j.celrep.2021.110185
|View full text |Cite
|
Sign up to set email alerts
|

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

Abstract: Evidence that the brain combines different value learning strategies to minimize prediction error is accumulating. However, the tradeoff between bias and variance error, which imposes different constraints on each learning strategy's performance, poses a challenge for value learning. While this tradeoff specifies the requirements for optimal learning, little has been known about how the brain deals with this issue. Here, we hypothesize that the brain adaptively resolves the bias-variance tradeoff during reinfo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 57 publications
(115 reference statements)
0
14
0
Order By: Relevance
“…Recent evidence has shown that the human brain weights these strategies according to context changes ( [23,[23][24][25][26][27][28][29]), resonating with the long-term prediction view. This evidence raised a new possibility that the brain implements meta-learning [15] or a meta-control of multiple learning strategies [30].…”
Section: Introductionmentioning
confidence: 89%
See 3 more Smart Citations
“…Recent evidence has shown that the human brain weights these strategies according to context changes ( [23,[23][24][25][26][27][28][29]), resonating with the long-term prediction view. This evidence raised a new possibility that the brain implements meta-learning [15] or a meta-control of multiple learning strategies [30].…”
Section: Introductionmentioning
confidence: 89%
“…The adaptability test simulates a highly volatile environment, for which we developed two scenarios: the one with sudden task changes randomly sampled from the set of 10 tasks (Figure 3D -task switching scenario) and the other with continuous, subtle task parameter changes in a random walk fashion within a single task (Figure 3E -context changing scenario). A choice optimality measure was used in the latter scenario because a simple performance metric, such as average or cumulative reward, is not sensitive enough to evaluate adaptive behavior to subtle context changes [25].…”
Section: Task Generalizability and Adaptability Of Human Prefrontal M...mentioning
confidence: 99%
See 2 more Smart Citations
“… 19 , 24 , 25 When the goal-directed system is deemed to be dominant over the other, the prefrontal arbitrator suppresses the posterior putamen, which is a habitual system controller involved in the estimation of the prediction error or action value of the system. 19 , 26 Because OCD is characterized by enhanced intolerance of uncertainty regarding changes in goal-directed contingency, 27 , 28 this uncertainty-based arbitration model would fit to test the imbalance theory in OCD. Given the crucial role of uncertainty differences in the arbitration process, the conditions of the Markov decision task provide a suitable environment for inducing transitions between the strategies by manipulating their prediction error or uncertainty.…”
Section: Introductionmentioning
confidence: 99%