2020
DOI: 10.31234/osf.io/uj85c
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep exploration as a unifying account of explore-exploit behavior

Abstract: Many decisions involve a choice between exploring unknown opportunities and exploiting well-known options. Work across a variety of domains, from animal foraging to human decision making, has suggested that animals solve such ``explore-exploit dilemmas'' with a mixture of two strategies: one driven by information seeking (directed exploration) and the other by behavioral variability (random exploration). Here we propose a unifying account in which these two strategies emerge from a kind of stochastic planning,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 29 publications
(18 citation statements)
references
References 15 publications
0
16
2
Order By: Relevance
“…A third approach to combining directed and random exploration, contends that the two strategies emerge as different behavioral facets of a unified algorithm known in the machine learning literature as Deep Exploration [77]. In this model [78], the exploreexploit choice is made by mental simulation of a small number of plausible futures that are 'deep,' in that they extend multiple time steps into the future, but narrow, in that the number of simulations used is small. Random exploration arises from this model because the simulations are stochastic.…”
Section: Integrating Directed and Random Explorationmentioning
confidence: 99%
See 1 more Smart Citation
“…A third approach to combining directed and random exploration, contends that the two strategies emerge as different behavioral facets of a unified algorithm known in the machine learning literature as Deep Exploration [77]. In this model [78], the exploreexploit choice is made by mental simulation of a small number of plausible futures that are 'deep,' in that they extend multiple time steps into the future, but narrow, in that the number of simulations used is small. Random exploration arises from this model because the simulations are stochastic.…”
Section: Integrating Directed and Random Explorationmentioning
confidence: 99%
“…Moreover, it predicts that there should be a tradeoff between directed and random exploration. As people use more simulations to make their decision they should exhibit more directed exploration and less random exploration, a prediction that holds both across the population and within subject [78].…”
Section: Integrating Directed and Random Explorationmentioning
confidence: 99%
“…One intriguing possibility for explore-exploit choices, is that the evidence that is being integrated corresponds to mental simulations of possible futures. Indeed, we have recently proposed such a mental simulation model of exploreexploit choices in a different task (Wilson et al, 2020). In this 'Deep Exploration' model of explore-exploit behavior, decisions are made by mental simulation of plausible future outcomes (e.g.…”
Section: Discussionmentioning
confidence: 99%
“…In our previous work, we considered the case where the number of simulations was fixed, but the model, at least in principle, is readily extended to the case where the decision is made by a threshold crossing process instead. A major goal for future work will therefore be to explicitly connect the drift-diffusion model presented here with the Deep Exploration account in (Wilson et al, 2020) to create a complete theory of the dynamics of explore-exploit choice.…”
Section: Discussionmentioning
confidence: 99%
“…Recent work using drift diffusion models have supported this hypothesis by connecting random exploration to lowered evidence thresholds and increased drift rates 49 . Conversely, longer response times have been related to the ability to mentally simulate a greater number of future outcomes 50 , producing more directed exploration but decreased random exploration 51 . Acceleration as a response to time pressure could thus produce a trade-off between different forms of exploration.…”
Section: Limiting Decision Timementioning
confidence: 99%