2009
DOI: 10.1016/j.cognition.2009.03.013
|View full text |Cite
|
Sign up to set email alerts
|

Short-term gains, long-term pains: How cues about state aid learning in dynamic environments

Abstract: Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision-making task which places short-and long-term rewards in conflict. Our goal in these studies was to evaluate how people's mental repr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

7
97
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 81 publications
(104 citation statements)
references
References 40 publications
7
97
0
Order By: Relevance
“…In fact, the presence of a causal structure may make the environment easier to learn. In fact, people only required 100 trials to learn about the system, whereas participants in Gureckis and Love's study [77] required 500 trials in order to learn their system. Though it is hard to directly compare, this may in fact indicate that an embedded causal structure (such as the simple one that underpins the dynamic control task used in the present study) can facilitate the rate of learning.…”
Section: Differences Between Dynamic Decision-making and Bandit Tasksmentioning
confidence: 99%
“…In fact, the presence of a causal structure may make the environment easier to learn. In fact, people only required 100 trials to learn about the system, whereas participants in Gureckis and Love's study [77] required 500 trials in order to learn their system. Though it is hard to directly compare, this may in fact indicate that an embedded causal structure (such as the simple one that underpins the dynamic control task used in the present study) can facilitate the rate of learning.…”
Section: Differences Between Dynamic Decision-making and Bandit Tasksmentioning
confidence: 99%
“…Across different tasks (Stafford & Dewar, 2015;Hayes & Petrov, 2015;Gureckis & Love, 2009), humans show variable behavioural strategies, with initial exploration of possible responses and later exploitation of efficient ones. From computer gaming to analogical reasoning and decision making, it has been shown that participants who initially show more diversity across trials (and possibly commit more errors) show better overall performance.…”
Section: Discussionmentioning
confidence: 99%
“…First, van Ravenzwaaij, Brown, and Wagenmakers (2011) showed that the speed of information processing (i.e., Òdrift rateÓ from the Diffusion Model: Ratcliff, 1978;and Ratcliff, Schmiedek, & McKoon, 2008), correlates well with the reaction timeÕs standard deviation and somewhat less consistently with its mean (see also Baumeister, 1998). Second, Stafford and Dewar (2014) reported that game players who exhibit greater initial variability in performance achieved a higher overall score, and explained this pattern in terms of the exploration/exploitation trade-off (for similar results in other domains see Stafford et al, 2012;Gureckis & Love, 2009;Hayes & Petrov, 2015). By combining the findings of Stafford and Dewar (2014) and van Ravenzwaaij, Brown, and Wagenmakers (2011) with our two measures of individual differences in speed we are in a position to hypothesize that (a) an individualÕs processing speed will be indicated by their deviations, both in average speed and in the change in speed; and that (b) a positive change in speed (i.e., acceleration) may disguise more variable performance at the beginning of the experiment, in order to explore and then exploit rewarding behaviour at a later point in time.…”
Section: Individual Differences In Readingmentioning
confidence: 94%
See 1 more Smart Citation
“…While originating from cognitive psychology it has been brought into the field of machine learning. Lately it has shown to be a promising method for studying human learning in a computerized form in a variety of applications and disciplines (Bogacz et al, 2007;Gureckis and Love, 2009;Montague et al, 2004;Rangel et al, 2008;Wiering and Otterlo, 2012). Temporal difference (TD) learning is a class of algorithms within RL, specialized for prediction i.e., it is able to account for future outcomes by using past experiences within some known or unknown environment.…”
Section: The Agentmentioning
confidence: 99%