2022
DOI: 10.1007/s42113-022-00145-2
|View full text |Cite
|
Sign up to set email alerts
|

Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning

Abstract: Reinforcement learning models have been used in many studies in the fields of neuroscience and psychology to model choice behavior and underlying computational processes. Models based on action values, which represent the expected reward from actions (e.g., Q-learning model), have been commonly used for this purpose. Meanwhile, the actor-critic learning model, in which the policy update and evaluation of an expected reward for a given state are performed in separate systems (actor and critic, respectively), ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 54 publications
0
4
0
Order By: Relevance
“…Like H t (a), its counterpart H t (s t ,a) can also be modeled with the accumulating hysteresis trace [21]. Along with the alternative of a replacing trace (see Methods), another more constrained implementation of hysteretic accumulation could be based on an action-prediction error (or choice-prediction error) with analogy to the reward-prediction error [40,[42][43][44][45][46][47]96,143,144,178,181]. The actionprediction error has been framed as "value-free", but this label and that of H t (s t ,a) as "habit strength" (cf.…”
Section: Plos Computational Biologymentioning
confidence: 99%
See 2 more Smart Citations
“…Like H t (a), its counterpart H t (s t ,a) can also be modeled with the accumulating hysteresis trace [21]. Along with the alternative of a replacing trace (see Methods), another more constrained implementation of hysteretic accumulation could be based on an action-prediction error (or choice-prediction error) with analogy to the reward-prediction error [40,[42][43][44][45][46][47]96,143,144,178,181]. The actionprediction error has been framed as "value-free", but this label and that of H t (s t ,a) as "habit strength" (cf.…”
Section: Plos Computational Biologymentioning
confidence: 99%
“…That the effects illuminated herein are so parsimonious and demonstrably extractable means that comparable studies of RL and other sequential tasks generally stand to benefit from considering bias and hysteresis as part of due diligence-even if the main focus of inquiry is directed elsewhere. Being more representative of actual behavior, the expanded 5-parameter base model 0CE1 aims to enhance parameter identifiability with respect to actual RL as opposed to action-specific components of variance that may mimic or otherwise obscure signatures of learning with spurious correlations [17,18,27,28,[39][40][41][42][43][44][45][46][47]. Before making additional assumptions, parsimoniously imposing action-specific parameters with first priority can be beneficial as a sort of regularization for learning parameters that in practice are nontrivial to extract and estimate.…”
Section: The Primacy Of Bias and Hysteresis As Well As Individual Dif...mentioning
confidence: 99%
See 1 more Smart Citation
“…In the policy gradient algorithm, the policy gradient updates the Actor based on actual rewards. Critic is responsible for learning reward mechanisms, and can discover potential rewards in the current state through the relationship between the learning environment and rewards, which is used to generate Actor actions [27,34,1]. Furthermore, Actor-Critic method adds evaluation network to policy gradient algorithm to fit Q value and the Actor is used to map the state to the action.…”
mentioning
confidence: 99%