2020
DOI: 10.1371/journal.pbio.3001028
|View full text |Cite
|
Sign up to set email alerts
|

The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning

Abstract: While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner’s action selection without affecting their value function. According to the secon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
66
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 46 publications
(68 citation statements)
references
References 48 publications
2
66
0
Order By: Relevance
“…We obtained similar results across different settings of the multi-armed bandit tasks such as skewed payoff distribution from which large payoffs realised less likely whereas small payoffs realised commonly due to an asymmetric probability unlike the Gaussian noise (March, 1996; Denrell, 2007) (Supporting Figure S11). Further, the conclusion still held under different assumptions of social influences on reinforcement learning, assuming that the conformity-biased influence acts on the learning process (the value-shaping model (Najar et al, 2020)) rather than on the action-selection (the decision-biasing model) assumed above (Supporting Methods). Although groups of agents with the value-shaping algorithm seemed more prone to herding, the results were not qualitatively changed and collective behavioural rescue emerged when β is sufficiently small (Supporting Figure S9).…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…We obtained similar results across different settings of the multi-armed bandit tasks such as skewed payoff distribution from which large payoffs realised less likely whereas small payoffs realised commonly due to an asymmetric probability unlike the Gaussian noise (March, 1996; Denrell, 2007) (Supporting Figure S11). Further, the conclusion still held under different assumptions of social influences on reinforcement learning, assuming that the conformity-biased influence acts on the learning process (the value-shaping model (Najar et al, 2020)) rather than on the action-selection (the decision-biasing model) assumed above (Supporting Methods). Although groups of agents with the value-shaping algorithm seemed more prone to herding, the results were not qualitatively changed and collective behavioural rescue emerged when β is sufficiently small (Supporting Figure S9).…”
Section: Resultsmentioning
confidence: 99%
“…We considered another implementation of social influences in the reinforcement learning, namely, the value-shaping process (Biele et al, 2011; Najar et al, 2020) rather than the decision-biasing process assumed in our main analyses. In the value-shaping model, social influence modifies the Q value’s updating process as follows: where the social frequency cue acts as an additional ‘bonus’ of the value that was weighted by σ vs ( σ vs > 0) and standardised by the expected payoff from choosing randomly among all alternatives π .…”
Section: Supplementary Materialsmentioning
confidence: 99%
See 3 more Smart Citations