2014
DOI: 10.1037/a0038199
|View full text |Cite
|
Sign up to set email alerts
|

Humans use directed and random exploration to solve the explore–exploit dilemma.

Abstract: All adaptive organisms face the fundamental tradeoff between pursuing a known reward (exploitation) and sampling lesser-known options in search of something better (exploration). Theory suggests at least two strategies for solving this dilemma: a directed strategy in which choices are explicitly biased toward information seeking, and a random strategy in which decision noise leads to exploration by chance. In this work we investigated the extent to which humans use these two strategies. In our “Horizon task,” … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

58
771
9

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 522 publications
(896 citation statements)
references
References 30 publications
58
771
9
Order By: Relevance
“…Our results establish a causal role for the rFPC in regulating both exploration and exploitation, and they underscore that this region is critical for participants to look beyond the current benefits at hand to search for potentially greater rewards (Wilson et al, 2014). Together, findings from the tests of our three hypotheses support that the activation observed in FPC when participants switch to exploratory choices (e.g., Daw et al, 2006;Boorman et al, 2009) indeed relates to behavioral control in those situations.…”
Section: Discussionsupporting
confidence: 65%
See 2 more Smart Citations
“…Our results establish a causal role for the rFPC in regulating both exploration and exploitation, and they underscore that this region is critical for participants to look beyond the current benefits at hand to search for potentially greater rewards (Wilson et al, 2014). Together, findings from the tests of our three hypotheses support that the activation observed in FPC when participants switch to exploratory choices (e.g., Daw et al, 2006;Boorman et al, 2009) indeed relates to behavioral control in those situations.…”
Section: Discussionsupporting
confidence: 65%
“…Second, we investigated the more novel hypothesis that tDCS-mediated increases or decreases in exploration are related to higher or lower sensitivity to previous unexpected outcomes in payoff magnitudes (i.e., prediction errors), respectively. This hypothesis was motivated by proposals that the rFPC is involved in integrating memories of recent events to guide behavior (Tsujimoto et al, 2011). Our results were consistent with all three hypotheses: Anodal and cathodal rFPC-targeted tDCS indeed caused increased and decreased exploration, respectively.…”
Section: Introductionsupporting
confidence: 80%
See 1 more Smart Citation
“…The prior mean is close to the generative mean of 50 used in the actual experiment, and the decision parameters are comparable to those found in our previous work (Wilson et al, 2014). The learning rate parameters, a 1 and a ¥ , were not included in our previous models and are worth discussing in more detail.…”
Section: Model Fitting Resultssupporting
confidence: 60%
“…This set includes tasks based on the basic problems of foraging theory, including the patch-leaving problem, the diet selection problem, the central place foraging problem, and so forth (Stephens & Krebs, 1986). It also includes stopping problems and other classic optimization problems, such as the k-arm bandit problems, horizon problems, and change point detection problems (Pearson, Hayden, Raghavachari, & Platt, 2009;Wilson, Geana, White, Ludvig, & Cohen, 2014;Wilson, Nassar, & Gold, 2013). Indeed, it may also include variants of the intertemporal choice task in which the postreward delays are clearly cued (Pearson et al, 2010).…”
Section: Suggestions For Future Researchmentioning
confidence: 99%