2008
DOI: 10.1098/rstb.2008.0158
|View full text |Cite
|
Sign up to set email alerts
|

Cortical mechanisms for reinforcement learning in competitive games

Abstract: Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
62
1

Year Published

2009
2009
2017
2017

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 61 publications
(65 citation statements)
references
References 63 publications
2
62
1
Order By: Relevance
“…Thorndike's "law of effect" is no longer in operation, even though the "spread of effect" of reward (19) still operates. Although several representations of choice history and reward history exist in the brain (20), only a limited number of regions, including the lOFC, represent the conjoint history of choices and rewards (21,22). An lOFC lesioned animal should struggle to learn when each option's value is very different from the others because it will erroneously estimate the value of each option as close to the mean.…”
Section: Discussionmentioning
confidence: 99%
“…Thorndike's "law of effect" is no longer in operation, even though the "spread of effect" of reward (19) still operates. Although several representations of choice history and reward history exist in the brain (20), only a limited number of regions, including the lOFC, represent the conjoint history of choices and rewards (21,22). An lOFC lesioned animal should struggle to learn when each option's value is very different from the others because it will erroneously estimate the value of each option as close to the mean.…”
Section: Discussionmentioning
confidence: 99%
“…As another possibility, which is not mutually exclusive to the scenario above, the estimate of reward-based arming probability (i.e., action value function estimated according to a simple RL algorithm) and the latest run length might be separately computed before being combined to estimate the final stacked arming probability. Physiological studies have found neural signals that are related to action value functions that were computed based on a simple RL algorithm Samejima et al 2005;Seo and Lee 2007) and neural signals that are related to animal's previous choice Kim et al 2007;Seo and Lee 2008) or the number of selfexecuted actions (Sawamura et al 2002) in cortical and subcortical brain structures. The latter may represent the latest run length in the DAWH task.…”
Section: Discussionmentioning
confidence: 99%
“…Neural activity (dependent variable) was also correlated across successive trials (e.g., mean serial correlation ϭ 0.052 Ϯ 0.007 and 0.062 Ϯ 0.007 for the neural data during the first and last 1 s of the delay stage), which could be due to a number of factors, such as a slow drift in the spike rates during the recording session. Regardless of its origin, such serial correlation in spike rates could potentially violate the independence assumption in the regression analysis and increase the amount of activity spuriously correlated with action values (Seo and Lee, 2008). We therefore used a permutation test to evaluate statistical significance of regression coefficients for the multiple regression analyses that contained action values.…”
Section: (Model 10)mentioning
confidence: 99%