Probability matching: Encouraging optimal responding in humans.

Fantino, Edmund; Esfandiari, Ali

doi:10.1037/h0087385

Cited by 52 publications

(51 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, despite this explicit representation, subject behavior was similar to that in the even pieces, computer stops condition, with significantly fewer choices toward the better option than in the uneven pieces, subject stops condition. This failure to maximize is consistent with previous work by Fantino and Esfandiari (4), which demonstrated suboptimal choice behavior even when subjects were explicitly told the overall reward probabilities, and also with the general fact that knowledge of the overall reward probabilities is typically insufficient to compute the optimal policy (i.e., it is only in the very special case where options are perfectly coupled and outcomes are completely independent across time that the overall reward probabilities are sufficient to compute the optimal policy).…”

Section: Resultssupporting

confidence: 92%

See 1 more Smart Citation

Alterations in choice behavior by manipulations of world model

Green

Benson²,

Kersten

et al. 2010

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) "probability matching"-a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.decision making | probability matching | reinforcement learning G iven a limited set of data about the world, what is the best thing to do? This question lies at the heart of all decision making, from simple everyday errands to elaborate and complex scientific experiments. If the reward amount for each possible action is known in advance, it is a straightforward process to make choices that maximize reward. In the real world, however, reward values are nearly always initially unknown and computing them is not trivial. Thus, understanding how to learn and compute reward is one of the key problems in reinforcement learning theory. Computing the optimal policy (i.e., determining the "best thing to do") requires acquiring one of two types of knowledge. In model-free learning, an agent must accumulate a substantial amount of experience regarding the consequences of taking various actions in various states, from which the average value of the states can be learned. In model-based learning, an agent must acquire a "world model," which constitutes beliefs about how the world generates outcomes in response to actions. Although both model-free and model-based reinforcement-learning algorithms have been the subject of much study in computer science and machine learning, model-free algorithms have been primarily used as models of human choice behavior.Whereas it is clear that our survival depends on the ability to make appropriate decisions from incomplete and ambiguous information, numerous studies in economics, psychology, and neuroscience have consistently found highly suboptimal behavior in seemingly simple decision tasks. Why is this? Consider the sequential binary decision task, which involves a choice between two options, one with a higher probability of success than the other (e.g., 70% vs. 30% of trials). The optimal strategy for this task is to determine which option has a higher probability of success and then choose only that option. Humans, however, tend to sample the alternatives in proportion to the options' respec...

show abstract

Section: Resultssupporting

confidence: 92%

“…This is an exceptionally consistent effect known as "probability matching." It has been replicated in dozens of laboratories, under myriad task conditions, and is extremely robust, persisting for thousands of trials (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15). Most theories treat this behavior as a fundamental failure of rational decision making.…”

mentioning

confidence: 99%

Alterations in choice behavior by manipulations of world model

Green

Benson²,

Kersten

et al. 2010

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…Probability matching is typically observed in both variants, but it tends to occur less when the probabilities are known from the start (Fantino & Esfandiari, 2002) and/or when the generating process can be identified as random (Morse & Runquist, 1960;Peterson & Ulehla, 1965). Instead of examining choice in a two-stage probability learning paradigm like the one used in Experiment 2, an alternative approach might be to apply a descriptive version of the problem to fully relieve participants from the onus of having to learn about outcome probabilities.…”

Section: Discussionmentioning

confidence: 99%

Taking the easy way out? Increasing implementation effort reduces probability maximizing under cognitive load

Schulze

Newell

2016

Mem Cogn

View full text Add to dashboard Cite

Cognitive load has previously been found to have a positive effect on strategy selection in repeated risky choice. Specifically, whereas inferior probability matching often prevails under single-task conditions, optimal probability maximizing sometimes dominates when a concurrent task competes for cognitive resources. We examined the extent to which this seemingly beneficial effect of increased task demands hinges on the effort required to implement each of the choice strategies. Probability maximizing typically involves a simple repeated response to a single option, whereas probability matching requires choice proportions to be tracked carefully throughout a sequential choice task. Here, we flipped this pattern by introducing a manipulation that made the implementation of maximizing more taxing and, at the same time, allowed decision makers to probability match via a simple repeated response to a single option. The results from two experiments showed that increasing the implementation effort of probability maximizing resulted in decreased adoption rates of this strategy. This was the case both when decision makers simultaneously learned about the outcome probabilities and responded to a dual task (Exp. 1) and when these two aspects were procedurally separated in two distinct stages (Exp. 2). We conclude that the effort involved in implementing a choice strategy is a key factor in shaping repeated choice under uncertainty. Moreover, highlighting the importance of implementation effort casts new light on the sometimes surprising and inconsistent effects of cognitive load that have previously been reported in the literature.

show abstract

“…Performance in probability-learning tasks can be improved by emphasizing that trials are independent or that probabilities are fixed. For instance, the rate of optimal responding increases when participants are told the actual probabilities for each option (Fantino & Esfandiari, 2002), when the task is presented as a gambling task rather than a problem-solving task (Goodnow, 1955), and when participants roll a die instead of seeing written sequences of outcomes (Peterson & Ulehla, 1965). The success of these interventions, which emphasize trial independence and stationary payoffs, implies that participants may not necessarily assume these features to be true when they approach the task.…”

Section: Models Of Sequential Choice Applied To Nonstationary Payoffsmentioning

confidence: 99%