2018
DOI: 10.1101/432534
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Finding structure in multi-armed bandits

Abstract: How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, which require participants to trade off exploration and exploitation. Standard multi-armed bandits assume that each option has an independent reward distribution.However, learning about options independently is unrealistic, since in the real world options often share an underlying structure. We study a class of structured bandit tasks, which we use to probe how generalization guides exploration. In a structured … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 78 publications
(89 reference statements)
0
4
0
Order By: Relevance
“…This work evolved within a broader research program that aims to understand how people use generalization to efficiently explore their environments in the search for rewards (Schulz et al, 2018; Stojic et al, 2015; Wu, Schulz, Speekenbrink, et al, 2018). In our earlier work, we have shown that participants can apply function learning to guide their search in contextual bandits (Schulz et al, 2018; Stojic et al, 2015), in bandits with spatial correlations between rewards (Wu, Schulz, Speekenbrink, et al, 2018), and in bandits with no explicit relation between features and rewards (Schulz, Franklin, & Gershman, 2018; Stojic, 2016). We thought that the same approach can be used to improve our understanding of novelty.…”
Section: Discussionmentioning
confidence: 99%
“…This work evolved within a broader research program that aims to understand how people use generalization to efficiently explore their environments in the search for rewards (Schulz et al, 2018; Stojic et al, 2015; Wu, Schulz, Speekenbrink, et al, 2018). In our earlier work, we have shown that participants can apply function learning to guide their search in contextual bandits (Schulz et al, 2018; Stojic et al, 2015), in bandits with spatial correlations between rewards (Wu, Schulz, Speekenbrink, et al, 2018), and in bandits with no explicit relation between features and rewards (Schulz, Franklin, & Gershman, 2018; Stojic, 2016). We thought that the same approach can be used to improve our understanding of novelty.…”
Section: Discussionmentioning
confidence: 99%
“…Next, we assess the suitability of the diffusion kernel as a model for more complex problems, by transitioning into a choice paradigm using a multi-armed bandit task with structured rewards E. Schulz, Franklin, and Gershman (2018). One advantage of the GP diffusion kernel model is that it makes prediction with estimates of the underlying uncertainty.…”
Section: Discussionmentioning
confidence: 99%
“…In contrast, here we observed that exposure to a single task allowed our participants to improve their knowledge of the task space. Second, while previous studies either focused on learning of multiple simple tasks (Kattner et al, 2017; Schulz et al, 2020), or learning of a single complex graph-like structure (Cleeremans & McClelland, 1991; Garvert et al, 2017; Schapiro et al, 2013), we focused on how humans learn multiple complex structures, an issue which had not been looked at until very recently (Mark et al, 2020; Wu et al, 2019). Third, studies typically focus on the consequences of what is being transferred, for instance, whether there is an immediate benefit to performance or a change in the rate of learning (Braun et al, 2010; Kattner et al, 2017); our paradigm gives us additional insights into the content that is being transferred, specifically, the pool of candidate models that are used for explaining the task increasingly matches the true pool of possible models within the paradigm.…”
Section: Discussionmentioning
confidence: 99%