2018
DOI: 10.1016/j.cognition.2017.12.014
|View full text |Cite
|
Sign up to set email alerts
|

Deconstructing the human algorithms for exploration

Abstract: The dilemma between information gathering (exploration) and reward seeking (exploitation) is a fundamental problem for reinforcement learning agents. How humans resolve this dilemma is still an open question, because experiments have provided equivocal evidence about the underlying algorithms used by humans. We show that two families of algorithms can be distinguished in terms of how uncertainty affects exploration. Algorithms based on uncertainty bonuses predict a change in response bias as a function of unce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

25
379
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 238 publications
(404 citation statements)
references
References 39 publications
25
379
0
Order By: Relevance
“…The fact that random exploration does not correlate with temporal discounting is also consistent with theories of random exploration (Watkins, 1989;Sutton & Barto, 2018). Moreover, this apparent dissociation between directed and random exploration is consistent with other findings showing that directed and random exploration have different computational properties ( (Gershman, 2018), different age dependence (Somerville et al, 2017), and may rely on dissociable neural systems (Zajkowski et al, 2017;Gershman & Tzovaras, 2018;Warren et al, 2017). In this regard it is notable that directed exploration appears to rely on the same frontal systems thought to underlie temporal discounting (Frank et al, 2009, Gershman & Tzovaras, 2018, Zajkowski et al, 2017Doya, 2002;McClure, Laibson, Loewenstein, & Cohen, 2004;McClure, Ericson, Laibson, Loewenstein, & Cohen, 2007), while random exploration does not.…”
Section: Discussionsupporting
confidence: 89%
“…The fact that random exploration does not correlate with temporal discounting is also consistent with theories of random exploration (Watkins, 1989;Sutton & Barto, 2018). Moreover, this apparent dissociation between directed and random exploration is consistent with other findings showing that directed and random exploration have different computational properties ( (Gershman, 2018), different age dependence (Somerville et al, 2017), and may rely on dissociable neural systems (Zajkowski et al, 2017;Gershman & Tzovaras, 2018;Warren et al, 2017). In this regard it is notable that directed exploration appears to rely on the same frontal systems thought to underlie temporal discounting (Frank et al, 2009, Gershman & Tzovaras, 2018, Zajkowski et al, 2017Doya, 2002;McClure, Laibson, Loewenstein, & Cohen, 2004;McClure, Ericson, Laibson, Loewenstein, & Cohen, 2007), while random exploration does not.…”
Section: Discussionsupporting
confidence: 89%
“…How well does our full model perform compared to alternative modelling approaches? Because most models in the literature either focus on the exploration (e.g., [19,9,31]) or the exploitation mechanism (e.g., [10,11,20]), we tested these two main components separately. For this, we first kept the exploitation mechanism unchanged, and tested five different models for the exploration phase: 1) our take-the-best model, 2) a probabilistic variation of that model in which the individual chooses to rely on the payoff cue or the visibility cue based on the most rewarding neighbouring solution [32,33], 3) a typical hill-climbing model where exploration is always directed towards the most-rewarding adjacent position [34], 4) a "blind search" model in which the exploration is only guided by novelty, and 5) a random search model in which the next position is randomly chosen among the adjacent solutions.…”
Section: Resultsmentioning
confidence: 99%
“…While TTB constitutes a valid model to describe how people make decisions between two options [37], we have shown that it can also be used to describe search behaviours. Recent research on human search behaviours distinguishes between directed and undirected exploration [32,31,19]. In multi-armed bandit tasks undirected exploration refers to the stochasticity of the search process causing random exploration decisions.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…where ω is a free parameter that controls the degree of uncertain-guided exploration (see Gershman, 2018, for a thorough discussion of uncertainty guided exploration). In our simulations, ω was sampled from the distribution logit −1 (ω) ∼ N (−1, 1).…”
Section: Computational Modelingmentioning
confidence: 99%