2023
DOI: 10.1101/2023.04.27.538570
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploration-exploitation mechanisms in recurrent neural networks and human learners in restless bandit problems

Abstract: A key feature of animal and human decision-making is to balance exploring unknown options for information gain (directed exploration) versus exploiting known options for immediate reward, which is often examined using restless bandit problems. Recurrent neural network models (RNNs) have recently gained traction in both human and systems neuroscience work on reinforcement learning. Here we comprehensively compared the performance of a range of RNN architectures as well as human learners on restless four-armed b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 141 publications
0
6
0
Order By: Relevance
“…Future work is required to extend these findings to other tasks used to study learning rate adaptation (Behrens et al, 2007(Behrens et al, , 2008Browning et al, 2015;Cook et al, 2019;Gagne et al, 2020) (Wang et al, 2018). Finally, computational modeling was restricted to an RL model, that disregarded effects of higher-order choice perseveration (Lau & Glimcher, 2008;Miller et al, 2019;Tuzsus et al, 2024) and more complex directed (i.e. uncertainty-guided) exploration mechanisms (Chakroun et al, 2020;Wiehler et al, 2021;Wilson et al, 2014).…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…Future work is required to extend these findings to other tasks used to study learning rate adaptation (Behrens et al, 2007(Behrens et al, , 2008Browning et al, 2015;Cook et al, 2019;Gagne et al, 2020) (Wang et al, 2018). Finally, computational modeling was restricted to an RL model, that disregarded effects of higher-order choice perseveration (Lau & Glimcher, 2008;Miller et al, 2019;Tuzsus et al, 2024) and more complex directed (i.e. uncertainty-guided) exploration mechanisms (Chakroun et al, 2020;Wiehler et al, 2021;Wilson et al, 2014).…”
Section: Discussionmentioning
confidence: 99%
“…Several limitations need to be addressed. First, we focused on the effects of training regime on behavioral adaptation to volatility in one specific neural network architecture (a 48-unit LSTM network trained with the A2C algorithm (Mnih et al, 2016) and Weber noise (Findling & Wyart, 2020)), based on our previous results that this architecture exhibits human-level performance on four-armed restless bandit problems (Tuzsus et al, 2024). Previous work on meta-RL applied similar RNN architectures (Binz & Schulz, 2022;Blanco-Pozo et al, 2024;Findling & Wyart, 2020;Hattori et al, 2023;Molano-Mazón et al, 2023;Wang et al, 2017Wang et al, , 2018, but generally, conclusions are limited to the architectures examined.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations