Noradrenergic Regulation of Two-Armed Bandit Performance

Swanson, Kyra; Averbeck, Bruno B.; Laubach, Mark

doi:10.1101/2020.11.13.382069

Cited by 1 publication

(3 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One caveat of this work is that, although often applied in the context of RL in volatile environments (Domenech et al, 2020;Kovach et al, 2012;Swanson et al, 2020), the comparison between stay and switch trials does not unequivocally map onto the exploitation vs. exploration distinction. For example, stay decisions can be due to greedy choices (choosing the option with the highest expected reward) but also due to perseveration.…”

Section: Discussionmentioning

confidence: 99%

“…In humans, exploratory choices are associated with increased activity in the frontoparietal network (Beharelle et al, 2015;Chakroun et al, 2020;Daw et al, 2006;Wiehler et al, 2021) and regulated by dopamine and norepinephrine neuromodulatory systems (Chakroun et al, 2020;Cremer et al, 2023;Dubois et al, 2021;McClure et al, 2005;Swanson et al, 2020). Choice predictive signals in prefrontal cortex neural populations are disrupted during exploratory choices, reflecting a potential neural mechanism for random exploration (Ebitz et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

“…In humans, exploratory choices are associated with increased activity in the frontoparietal network [6,11,13,24] and regulated by dopamine and norepinephrine neuromodulatory systems [11,[25][26][27][28]. Choice predictive signals in prefrontal cortex neural populations are disrupted during exploratory choices, reflecting a potential neural mechanism for random exploration [7].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploration-exploitation mechanisms in recurrent neural networks and human learners in restless bandit problems

Tuzsus

Pappas

Peters

2023

Preprint

View full text Add to dashboard Cite

A key feature of animal and human decision-making is to balance exploring unknown options for information gain (directed exploration) versus exploiting known options for immediate reward, which is often examined using restless bandit problems. Recurrent neural network models (RNNs) have recently gained traction in both human and systems neuroscience work on reinforcement learning. Here we comprehensively compared the performance of a range of RNN architectures as well as human learners on restless four-armed bandit problems. The best-performing architecture (LSTM network with computation noise) exhibited human-level performance. Cognitive modeling showed that human and RNN behavior is best described by a learning model with terms accounting for perseveration and directed exploration. However, whereas human learners exhibited a positive effect of uncertainty on choice probability (directed exploration), RNNs showed the reverse effect (uncertainty aversion), in conjunction with increased perseveration. RNN hidden unit dynamics revealed that exploratory choices were associated with a disruption of choice predictive signals during states of low state value, resembling a win-stay-loose-shift strategy, and resonating with previous single unit recording findings in monkey prefrontal cortex. During exploration trials, RNN selected exploration targets predominantly based on their recent value, but tended to avoid more uncertain options. Our results highlight both similarities and differences between exploration behavior as it emerges in RNNs, and computational mechanisms identified in cognitive and systems neuroscience work.

show abstract

Section: Discussionmentioning

confidence: 99%