2020
DOI: 10.31234/osf.io/uepr7
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The dynamics of explore-exploit decisions reveal a signal-to-noise mechanism for random exploration

Abstract: Growing evidence suggests that behavioral variability plays a critical role in how humans manage the trade-off between exploration and exploitation. In these decisions a little variability can help us to overcome the desire to exploit known rewards by encouraging us to randomly explore something else. Here we investigate how such `random exploration' could be controlled using a drift-diffusion model of the explore-exploit choice. In this model, variability is controlled by either the signal-to-noise ratio with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…Exploration and exploitation states are not discrete, but exist along a continuum ( Addicott et al, 2017 ). Instead of switching between binary states, humans manage environmental instability by adjusting the greediness of their decision policies ( Sadeghiyeh et al, 2020 ; Prat-Carrabin et al, 2020 ; Feng et al, 2020 ; Wilson et al, 2014 ; Payzan-LeNestour and Bossaerts, 2011 ; Payzan-Lenestour and Bossaerts, 2012 ; Wilson et al, 2021 ). Depending on the relative configuration of parameters in the accumulation to bound process, this adjustment can manifest as either speeded or slowed decisions ( Figure 1E ; Alexandrowicz, 2020 ; Ratcliff, 1978 ).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Exploration and exploitation states are not discrete, but exist along a continuum ( Addicott et al, 2017 ). Instead of switching between binary states, humans manage environmental instability by adjusting the greediness of their decision policies ( Sadeghiyeh et al, 2020 ; Prat-Carrabin et al, 2020 ; Feng et al, 2020 ; Wilson et al, 2014 ; Payzan-LeNestour and Bossaerts, 2011 ; Payzan-Lenestour and Bossaerts, 2012 ; Wilson et al, 2021 ). Depending on the relative configuration of parameters in the accumulation to bound process, this adjustment can manifest as either speeded or slowed decisions ( Figure 1E ; Alexandrowicz, 2020 ; Ratcliff, 1978 ).…”
Section: Discussionmentioning
confidence: 99%
“…Random exploration refers to inherent behavioral variability that leads us to explore other options, while directed exploration refers to the volitional pursuit of new information. Feng and colleagues recently found that random exploration is driven by changes in the drift rate and the boundary height, with drift rate changes dominating the policy shift ( Feng et al, 2020 ). When environmental conditions encouraged exploration, the drift rate slowed, reducing the signal-to-noise ratio of the reward representation.…”
Section: Discussionmentioning
confidence: 99%
“…Combined with lower evidence thresholds, acceleration can lead to more frequently choosing options that would otherwise be ignored, consistent with increased random exploration. Recent work using drift diffusion models has supported this hypothesis by connecting random exploration to lowered evidence thresholds and increased drift rates 48 . Conversely, longer response times have been related to the ability to mentally simulate a greater number of future outcomes 49 , producing more directed exploration but decreased random exploration 50 .…”
Section: Introductionmentioning
confidence: 95%
“…Exploration and exploitation states are not discrete, but exist along a continuum ( Addicott et al, 2017) . Instead of switching between binary states, humans manage environmental instability by adjusting the degree of exploration and exploitation ( Sadeghiyeh et al, 2020 ; Prat-Carrabin et al, 2020; Feng et al, 2020 ; Wilson et al, 2014 ; Payzan-LeNestour and Bossaerts, 2011 , 2012 ; Wilson et al, 2020 ). Depending on the relative configuration of parameters in the accumulation to bound process, this adjustment can manifest as either speeded or slowed decisions (Fig.…”
Section: Discussionmentioning
confidence: 99%