2010
DOI: 10.1073/pnas.1001709107
|View full text |Cite
|
Sign up to set email alerts
|

Alterations in choice behavior by manipulations of world model

Abstract: How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) "probability matching"-a consistent example of suboptimal choice behavior seen i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

6
88
2

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 84 publications
(96 citation statements)
references
References 27 publications
6
88
2
Order By: Relevance
“…This result suggests that n − 1 biases reflect subjects' estimates of the correlations in stimulus sequences but that subjects' estimates of stimulus correlations are positively biased. The positive-correlation bias observed in subjects' perceptual estimates of target speed is consistent with the literature on temporal dependencies in more cognitive binomial decision-making tasks (24)(25)(26). Kareev (24), for example, presented a sequence of binary items (Xs and Os) to subjects and asked them to predict the next item on each trial.…”
Section: Discussionsupporting
confidence: 61%
See 2 more Smart Citations
“…This result suggests that n − 1 biases reflect subjects' estimates of the correlations in stimulus sequences but that subjects' estimates of stimulus correlations are positively biased. The positive-correlation bias observed in subjects' perceptual estimates of target speed is consistent with the literature on temporal dependencies in more cognitive binomial decision-making tasks (24)(25)(26). Kareev (24), for example, presented a sequence of binary items (Xs and Os) to subjects and asked them to predict the next item on each trial.…”
Section: Discussionsupporting
confidence: 61%
“…One proposal is that it reflects a strong prior on the statistical structure of the world, in which strong positive temporal correlations between events may be ubiquitous (26). The current study has shown that the brain can adapt its internal model of temporal correlations to more closely match the correlations of stimulus sequences in the proximal environment, although only partially on the short time scale used here (1 h).…”
Section: Discussionmentioning
confidence: 80%
See 1 more Smart Citation
“…Pðx is goodÞ Pðx is goodÞ + Pðy is goodÞ : [2] This rule is known to be optimal when there is competition for resources (39,40) and when the estimated probabilities change in time (41)(42)(43)(44). Probability matching in Eq.…”
Section: Resultsmentioning
confidence: 99%
“…1. A common decision rule in animals, from insects to humans, is probability matching, according to which the probability of choosing a behavior is proportional to the estimated probability (35)(36)(37)(38)(39)(40)(41)(42)(43)(44),…”
Section: Resultsmentioning
confidence: 99%