Learning to act in an environment to maximise rewards is among the brain's key functions. This process has often been conceptualised within the framework of reinforcement learning, which has also gained prominence in machine learning and artificial intelligence (AI) as a way to optimise decision-making. A common aspect of both biological and machine reinforcement learning is the reactivation of previously experienced episodes, referred to as replay. Replay is important for memory consolidation in biological neural networks, and is key to stabilising learning in deep neural networks. Here, we review recent developments concerning the functional roles of replay in the fields of neuroscience and AI. Complementary progress suggests how replay might support learning processes, including generalisation and continual learning, affording opportunities to transfer knowledge across the two fields to advance the understanding of biological and artificial learning and memory. Replay in biological and artificial reinforcement learning Highlights• Reinforcement learning in deep neural networks often relies on the interleaving of new and old episodes, a technique which mimics the replay of neuronal activity in the brain.
8Neural activity encoding recent experiences is replayed during sleep and rest to promote consolidation of the 9 corresponding memories. However, precisely which features of experience influence replay prioritisation to 10 optimise adaptive behaviour remains unclear. Here, we trained adult male rats on a novel maze-based rein-11 forcement learning task designed to dissociate reward outcomes from reward-prediction errors. Four variations 12 of a reinforcement learning model were fitted to the rats' behaviour over multiple days. Behaviour was best 13 predicted by a model incorporating replay biased by reward-prediction error, compared to the same model with 14 no replay; random replay or reward-biased replay produced poorer predictions of behaviour. This insight dis-15 entangles the influences of salience on replay, suggesting that reinforcement learning is tuned by post-learning 16 replay biased by reward-prediction error, not by reward per se. This work therefore provides a behavioural and 17 theoretical toolkit with which to measure and interpret replay in striatal, hippocampal and neocortical circuits. 18 2 19To make good decisions, it is typically beneficial to use past experience to guide future behaviour. Actions 20 which have previously produced good outcomes in a similar context can be reinforced to adapt behaviour for 21 maximising benefit. Crucial to this mechanism is the ability for neuronal spiking activity to drive synaptic plas-22 ticity, strengthening the synaptic connections between neurons to establish functional networks which encode 23 task-relevant information or drive task-relevant actions. These functional networks are refined during sleep 24 and rest, when many neurons switch to an "offline" state in which they replay activity encoding previous or 25 anticipated upcoming experiences rather than current events or behaviours (Yu et al. 2017). This offline replay, 26 found across cortical, limbic and basal ganglia regions, has been suggested to play a role in decision-making 27 (Pfeiffer and Foster 2013), emotional processing (Cairney et al. 2014), generalising across episodes (Lewis and 28 Durrant 2011), and reinforcement learning (Dupret et al. 2010). 29 Studies in which replay has been manipulated provide strong evidence for its contributions to memory consol-30 idation. Artificially enhancing replay by presenting odours or sounds during sleep, which had previously been 31 paired with object locations or visual stimuli, leads to better subsequent recall of the paired stimuli (Rasch et al. 32 2007; Rudoy et al. 2009; Antony et al. 2012; Bendor and Wilson 2012). Disrupting replay events, meanwhile, 33 impairs subsequent spatial memory (Girardeau et al. 2009; Ego-Stengel and Wilson 2010; Jadhav et al. 2012; 34 Michon et al. 2019). 35 An examination of how replay aids these cognitive processes requires assessment of which activity is replayed 36 with greatest strength or frequency. Activity which is associated with experiences of reward (Foster and Wilson 37 2006; L...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.