2019
DOI: 10.1016/j.tics.2019.07.012
|View full text |Cite
|
Sign up to set email alerts
|

Where Does Value Come From?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
98
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 120 publications
(112 citation statements)
references
References 51 publications
2
98
0
Order By: Relevance
“…However, the key conceptual difference of the DopAct framework is that it assumes that animals aim to achieve a desired level of reserves ( Buckley et al, 2017 ; Hull, 1952 ; Stephan et al, 2016 ), rather than always maximize acquiring resources. It has been proposed that when a physiological state is considered, the reward an animal aims to maximize can be defined as a reduction of distance between the current and desired levels of reserves ( Juechems and Summerfield, 2019 ; Keramati and Gutkin, 2014 ). Under this definition, a resource is equal to such subjective reward only if consuming it would not bring the animal beyond its optimal reserve level.…”
Section: Discussionmentioning
confidence: 99%
“…However, the key conceptual difference of the DopAct framework is that it assumes that animals aim to achieve a desired level of reserves ( Buckley et al, 2017 ; Hull, 1952 ; Stephan et al, 2016 ), rather than always maximize acquiring resources. It has been proposed that when a physiological state is considered, the reward an animal aims to maximize can be defined as a reduction of distance between the current and desired levels of reserves ( Juechems and Summerfield, 2019 ; Keramati and Gutkin, 2014 ). Under this definition, a resource is equal to such subjective reward only if consuming it would not bring the animal beyond its optimal reserve level.…”
Section: Discussionmentioning
confidence: 99%
“…20 These feature weights could be encoded as part of a generic latent state representation, such as the one thought to be encoded in orbitofrontal cortex, 21,22 or in brain regions specific to representing physiological needs, such as the hypothalamus 23 or the insula. 24 Such a perspective can help resolve the "reward paradox", 25 a key challenge of applying RL as a theory of human and animal learning, which typically assumes an external reward function that does not exist in natural environments. This view predicts that inducing different motivational states (for example, hunger, thirst, sleepiness) would correspond to naturalistically varying the feature weights w.…”
Section: Discussionmentioning
confidence: 99%
“…To tackle this, we incorporate an update mechanism that learns from both simulated and real experience to guide future search toward more promising regions of the hypothesis space (21). This is formally defined as a Gaussian mixture model policy over the three tools and their positions, π (s), which represents the model's belief about high-value actions for each tool.…”
Section: Ssup Modelmentioning
confidence: 99%