The explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards — exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less ‘temporal discounting’ associated with more exploration. By measuring individual differences in temporal discounting and correlating them with explore-exploit behavior, we tested whether this theoretical prediction holds in practice. We used the 27-item Delay-Discounting Questionnaire to estimate temporal discounting and the Horizon Task to quantify two strategies of explore-exploit behavior: directed exploration, where information drives exploration by choice, and random exploration, where behavioral variability drives exploration by chance. We find a clear correlation between temporal discounting and directed exploration, with more temporal discounting leading to less directed exploration. Conversely, we find no relationship between temporal discounting and random exploration. Unexpectedly, we find that the relationship with directed exploration appears to be driven by a correlation between temporal discounting and uncertainty seeking at short time horizons, rather than information seeking at long horizons. Taken together our results suggest a nuanced relationship between temporal discounting and explore-exploit behavior that may be mediated by multiple factors.
Many decisions involve a choice between exploring unknown opportunities and exploiting well-known options. Work across a variety of domains, from animal foraging to human decision making, has suggested that animals solve such ``explore-exploit dilemmas'' with a mixture of two strategies: one driven by information seeking (directed exploration) and the other by behavioral variability (random exploration). Here we propose a unifying account in which these two strategies emerge from a kind of stochastic planning, known in the machine learning literature as Deep Exploration. In this model, the explore-exploit decision is made by stochastic simulation of plausible futures that are deep, in that they extend far into the future, and narrow, in that the number of possible futures they consider is small. By applying Deep Exploration to a simple explore-exploit task we show theoretically how directed and random exploration can emerge in these settings. Moreover, we show that Deep Exploration implies a tradeoff between directed and random exploration that is mediated by the number of simulations, or samples --- with more samples leading to increased directed exploration and decreased random exploration at the expense of greater time taken to respond. By measuring human behavior on the same simple task, we show that this reaction-time-mediated tradeoff exists in human behavior both between and within participants. We therefore suggest that Deep Exploration is a unifying account of explore-exploit behavior in humans.
The COVID-19 pandemic reminded us of how quickly conspiracy ideas can spread and how dire their consequences could be. One important question is what traits would predict susceptibility to conspiracy beliefs. Previous research pointed to one of those traits: reflective versus intuitive cognitive style. Here we examined how cognitive style correlates with founded and unfounded beliefs about the origin of COVID-19. A sample of 173 Iranians rated the likelihood of different beliefs about the origin of the new coronavirus and answered the original Cognitive Reflection Test (Frederick, 2005). In line with previous research, the reflective responses were negatively correlated with conspiratory beliefs and positively correlated with the founded statement (that the virus was spread from wild animals by chance). The reverse pattern was found for the intuitive responses. The results accrue more evidence in support of a relationship between reflective-analytic style of thinking and the tendency to reject conspiracy beliefs.
The explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards -exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less 'temporal discounting' associated with more exploration. By measuring individual differences in temporal discounting and correlating them with explore-exploit behavior, we tested whether this theoretical prediction holds in practice. We used the 27item Delay-Discounting Questionnaire (Kirby et al., 1999) to estimate temporal discounting and the Horizon Task (Wilson et al. 2014) to quantify two strategies of explore-exploit behavior: directed exploration, where information drives exploration by choice, and random exploration, where behavioral variability drives exploration by chance. We find a clear correlation between temporal discounting and directed exploration, with more temporal discounting leading to less directed exploration. Conversely, we find no relationship between temporal discounting and random exploration. Unexpectedly, we find that the relationship with directed exploration appears to be driven by a correlation between temporal discounting and uncertainty seeking at short time horizons, rather than information seeking at long horizons. Taken together our results suggest a nuanced relationship between temporal discounting and explore-exploit behavior that may be mediated by multiple factors. Recently, a number of studies have shown that people make explore-exploit decisions using a mixture of two strategies: directed exploration and random exploration (Wilson,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.