“…In order to satisfy this constraint, you should only explore options that—while uncertain—are likely to be “safe.” Such restricted exploration–exploitation problems are indeed common in daily life, from choosing which restaurant to visit (avoid food poisoning), where to buy a second‐hand car (avoid buying a lemon), to finding the shortest route home (avoid dangerous terrain). In our previous research on human behavior in contextual (Schulz, Konstantinidis, & Speekenbrink, ) and spatially correlated multi‐armed bandits (Wu, Schulz, Speekenbrink, Nelson, & Meder, in press), we found that human behavior in the search for rewards without constraints can be robustly described by a combination of a universal function learning mechanism and a decision strategy which explicitly balances an option’s expected reward and its attached uncertainty. The function learning mechanism was formalized as Gaussian process regression, which is a form of non‐parametric Bayesian regression that adapts its complexity to the data at hand (Griffiths, Lucas, Williams, & Kalish, ; Rasmussen, ), while the decision strategy was formalized as upper confidence bound sampling strategy (UCB; Auer, ).…”