From foraging for food to learning complex games, many aspects of human behaviour can be framed as a search problem with a vast space of possible actions. Under finite search horizons, optimal solutions are generally unobtainable. Yet how do humans navigate vast problem spaces, which require intelligent exploration of unobserved actions? Using a variety of bandit tasks with up to 121 arms, we study how humans search for rewards under limited search horizons, where the spatial correlation of rewards (in both generated and natural environments) provides traction for generalization. Across a variety of different probabilistic and heuristic models, we find evidence that Gaussian Process function learning-combined with an optimistic Upper Confidence Bound sampling strategy-provides a robust account of how people use generalization to guide search. Our modelling results and parameter estimates are recoverable, and can be used to simulate human-like performance, providing insights about human behaviour in complex environments.All rights reserved. No reuse allowed without permission.was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which . http://dx.doi.org/10.1101/171371 doi: bioRxiv preprint first posted online Aug. 1, 2017; previous work exploring inductive biases in pure function learning contexts 21,22 and human behaviour in univariate function optimization 23 , we present a comprehensive approach using a robust computational modelling framework to understand how humans generalize in an active search task.Across three studies using uni-and bivariate multi-armed bandits with up to 121 arms, we compare a diverse set of computational models in their ability to predict individual human behaviour. In all experiments, the majority of subjects are best captured by a model combining function learning using Gaussian Process (GP) regression, with an optimistic Upper Confidence Bound (UCB) sampling strategy that directly balances expectations of reward with the reduction of uncertainty. Importantly, we recover meaningful and robust estimates about the nature of human generalization, showing the limits of traditional models of associative learning 24 in tasks where the environmental structure supports learning and inference.The main contributions of this paper are threefold:1. We introduce the spatially correlated multi-armed bandit as a paradigm for studying how people use generalization to guide search in larger problems space than traditionally used for studying human behaviour.2. We find that a Gaussian Process model of function learning robustly captures how humans generalize and learn about the structure of the environment, where an observed tendency towards undergeneralization is shown to sometimes be beneficial.3. We show that participants solve the exploration-exploitation dilemma by optimistically inflating expectations of reward by the underlying uncertainty, with recoverable evidence for the separate phenome...
How should tests (or queries, questions, or experiments) be selected? Does it matter if only a single test is allowed, or if a sequential test strategy can be planned in advance? This article contributes two sets of theoretical results bearing on these questions. First, for selecting a single test, several Optimal Experimental Design (OED) ideas have been proposed in statistics and other disciplines. The OED models are mathematically nontrivial. How is it that they often predict human behavior well? One possibility is that simple heuristics can approximate or exactly implement OED models. We prove that heuristics can identify the highest information value queries (as quantified by OED models) in several situations, thus providing a possible algorithmic-level theory of human behavior. Second, we address whether OED models are optimal for sequential search, as is frequently presumed. We consider the Person Game, a 20-questions scenario, as well as a two-category, binary feature scenario, both of which have been widely used in psychological research. In each task, we demonstrate via specific examples and extended computational simulations that neither the OED models nor the heuristics considered in the literature are optimal. Little research addresses human behavior in such situations. We call for experimental research into how people approach the sequential planning of tests, and theoretical research on what sequential planning procedures are most successful, and we offer a number of testable predictions for discriminating among candidate models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.