2017
DOI: 10.1101/106286
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mapping the unknown: The spatially correlated multi-armed bandit

Abstract: We introduce the spatially correlated multi-armed bandit as a task coupling function learning with the explorationexploitation trade-off. Participants interacted with bi-variate reward functions on a two-dimensional grid, with the goal of either gaining the largest average score or finding the largest payoff. By providing an opportunity to learn the underlying reward function through spatial correlations, we model to what extent people form beliefs about unexplored payoffs and how that guides search behavior. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
26
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 23 publications
(28 citation statements)
references
References 15 publications
2
26
0
Order By: Relevance
“…Moreover, they sampled inputs to learn about both the overall shape of the function as well as the location of high outputs. Interestingly, this result mirrors recent evidence that participants solve the explorationexploitation dilemma in reinforcement learning problems in a similar fashion (Wu et al, 2017), thereby hinting at the possibility of a universal sampling strategy underlying both information search and the search for rewards. Participants may not easily be able to turn off the exploitation part of their sampling strategy as they normally encounter a mix of exploration and exploitation problems in real life.…”
Section: Discussionsupporting
confidence: 79%
See 2 more Smart Citations
“…Moreover, they sampled inputs to learn about both the overall shape of the function as well as the location of high outputs. Interestingly, this result mirrors recent evidence that participants solve the explorationexploitation dilemma in reinforcement learning problems in a similar fashion (Wu et al, 2017), thereby hinting at the possibility of a universal sampling strategy underlying both information search and the search for rewards. Participants may not easily be able to turn off the exploitation part of their sampling strategy as they normally encounter a mix of exploration and exploitation problems in real life.…”
Section: Discussionsupporting
confidence: 79%
“…Moreover, GP models exhibit an inherent duality which makes them both a rule-based and a similarity-based model of function learning (see Lucas et al, 2015). As GP models have also been extended to account for exploratory behavior in function optimization tasks (Wu, Schulz, Speekenbrink, Nelson, & Meder, 2017), we will utilize them as candidate active learning models here.…”
Section: Function Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, we presented information about both spatial and conceptual features in both search tasks, but only one of them was relevant for generalization and predicting rewards. At the beginning of each round only a single randomly chosen option was revealed (i.e., displayed the numerical reward and corresponding color aid), whereby subjects had a limited horizon of 10 actions in each round (40% of the total search space; similar to Wu et al, 2017), thereby inducing an exploration-exploitation trade-off.…”
Section: Methodsmentioning
confidence: 99%
“…As the true generative model of the reward distributions on each round is a Gaussian Process (GP), we can also model structured function learning using the same framework (Rasmussen & Williams, 2006;. Gaussian Process regression is a nonparametric Bayesian method that has been successfully applied as a model of how people generalize in contextual (Schulz, Konstantinidis, & Speekenbrink, 2017) and spatially correlated (Wu, Schulz, Speekenbrink, Nelson, & Meder, 2017) multi-armed bandits.…”
Section: Gaussian Process Regressionmentioning
confidence: 99%