2016
DOI: 10.1101/081091
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Putting bandits into context: How function learning supports decision making

Abstract: We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximise their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
58
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

5
3

Authors

Journals

citations
Cited by 37 publications
(63 citation statements)
references
References 47 publications
(23 reference statements)
4
58
1
Order By: Relevance
“…In order to satisfy this constraint, you should only explore options that—while uncertain—are likely to be “safe.” Such restricted exploration–exploitation problems are indeed common in daily life, from choosing which restaurant to visit (avoid food poisoning), where to buy a second‐hand car (avoid buying a lemon), to finding the shortest route home (avoid dangerous terrain). In our previous research on human behavior in contextual (Schulz, Konstantinidis, & Speekenbrink, ) and spatially correlated multi‐armed bandits (Wu, Schulz, Speekenbrink, Nelson, & Meder, in press), we found that human behavior in the search for rewards without constraints can be robustly described by a combination of a universal function learning mechanism and a decision strategy which explicitly balances an option’s expected reward and its attached uncertainty. The function learning mechanism was formalized as Gaussian process regression, which is a form of non‐parametric Bayesian regression that adapts its complexity to the data at hand (Griffiths, Lucas, Williams, & Kalish, ; Rasmussen, ), while the decision strategy was formalized as upper confidence bound sampling strategy (UCB; Auer, ).…”
Section: Introductionmentioning
confidence: 86%
See 2 more Smart Citations
“…In order to satisfy this constraint, you should only explore options that—while uncertain—are likely to be “safe.” Such restricted exploration–exploitation problems are indeed common in daily life, from choosing which restaurant to visit (avoid food poisoning), where to buy a second‐hand car (avoid buying a lemon), to finding the shortest route home (avoid dangerous terrain). In our previous research on human behavior in contextual (Schulz, Konstantinidis, & Speekenbrink, ) and spatially correlated multi‐armed bandits (Wu, Schulz, Speekenbrink, Nelson, & Meder, in press), we found that human behavior in the search for rewards without constraints can be robustly described by a combination of a universal function learning mechanism and a decision strategy which explicitly balances an option’s expected reward and its attached uncertainty. The function learning mechanism was formalized as Gaussian process regression, which is a form of non‐parametric Bayesian regression that adapts its complexity to the data at hand (Griffiths, Lucas, Williams, & Kalish, ; Rasmussen, ), while the decision strategy was formalized as upper confidence bound sampling strategy (UCB; Auer, ).…”
Section: Introductionmentioning
confidence: 86%
“…Gaussian process regression is a non‐parametric Bayesian approach toward function learning which can perform generalization by making inductive inferences about unobserved outcomes. In past research we found that Gaussian process regression captures the inductive biases of human participants in a variety of explicit function learning tasks (Schulz, Tenenbaum, Duvenaud, Speekenbrink, & Gershman, ) and provides an accurate description of human generalization in contextual and spatially correlated multi‐armed bandits without the presence of unsafe outcomes (Schulz et al., ; Wu et al., in press).…”
Section: Function Learning As Model Of Generalizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Rather than explaining development as a change in how we explore given some beliefs about the world, generalization-based accounts attribute developmental differences to the way we form our beliefs in the first place. Many studies have shown that human learners use structured knowledge about the environment to guide exploration (E. Schulz, Konstantinidis, & Speekenbrink, 2017), where the quality of these representations and the way that people utilize them to generalize across experiences can have a crucial impact on search behavior. Thus, development of more complex cognitive processes (Blanco et al, 2016), leading to broader generalizations, could also account for the observed developmental differences in sampling behavior.…”
Section: Introductionmentioning
confidence: 99%
“…As a first step toward modeling behavior in a probabilistic framework, we use a model that values both maximizing the probability of a correct query and a curiosity bonus, similar to recent work on human reinforcement learning (Schulz, Konstantinidis, & Speekenbrink, 2018;Wu et al, 2018). The curiosity bonus can be defined as information gain in the space of possible hypotheses (hidden codes).…”
Section: A Quintessential Game Of Explorationmentioning
confidence: 99%