2014
DOI: 10.1037/a0033455
|View full text |Cite
|
Sign up to set email alerts
|

Navigating complex decision spaces: Problems and paradigms in sequential choice.

Abstract: To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
36
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
8
1

Relationship

3
6

Authors

Journals

citations
Cited by 33 publications
(37 citation statements)
references
References 208 publications
(332 reference statements)
1
36
0
Order By: Relevance
“…To quantify the contributions of general and item-specific knowledge to choice behavior, we modeled the learning and decision-making process. To do so, we used temporal difference learning (Sutton & Barto, 1998), an influential technique in the field of artificial intelligence with strong ties to psychological theories of human and animal conditioning (Walsh & Anderson, in press) and physiological models of phasic dopamine responses (Schultz, 1998). Central to this technique is the idea that differences between actual and expected outcomes, or reward prediction errors, provide teaching signals.…”
Section: Methodsmentioning
confidence: 99%
“…To quantify the contributions of general and item-specific knowledge to choice behavior, we modeled the learning and decision-making process. To do so, we used temporal difference learning (Sutton & Barto, 1998), an influential technique in the field of artificial intelligence with strong ties to psychological theories of human and animal conditioning (Walsh & Anderson, in press) and physiological models of phasic dopamine responses (Schultz, 1998). Central to this technique is the idea that differences between actual and expected outcomes, or reward prediction errors, provide teaching signals.…”
Section: Methodsmentioning
confidence: 99%
“…It is characterized by many theorists as being the more basic, and as being especially responsive to emotions (e.g., Metcalfe & Mischel, 1999; Strack & Deutsch, 2004). Sometimes the term model-free learning is used to refer to the way in which this mode of function acquires information over time, reflecting the idea that its learning follows from an accumulation of associations (e.g., Daw, Niv, & Dayan, 2005; Dayan, 2008; Dolan & Dayan, 2013; Otto, Gershman, Markman, & Daw, 2013; Walsh & Anderson, 2014). Functioning in this mode may be said to reflect habits (e.g., Ouellette & Wood, 1998) or responsiveness to cues of the moment that trigger automatic responses.…”
Section: How Might the P Factor Reflect Meaningful Functional Variation?mentioning
confidence: 99%
“…Some psychology models focus only on learning (e.g., reinforcement learning; Walsh & Anderson, 2014) or forgetting (e.g., mathematical forgetting functions; Rubin & Wenzel, 1996). This is in contrast to account in great detail for one or a small number of findings.…”
Section: Account For Effects Of Training Variables On Learning and Rementioning
confidence: 99%
“…This is in contrast to account in great detail for one or a small number of findings. Some psychology models focus only on learning (e.g., reinforcement learning; Walsh & Anderson, 2014) or forgetting (e.g., mathematical forgetting functions; Rubin & Wenzel, 1996). Yet these are ongoing, dynamic, competing processes.…”
Section: Account For Effects Of Training Variables On Learning and Rementioning
confidence: 99%