Most reinforcement learning models assume that the reward signal arrives after the activity that led to the reward, placing constraints on the possible underlying cellular mechanisms. Here we show that dopamine, a positive reinforcement signal, can retroactively convert hippocampal timing-dependent synaptic depression into potentiation. This effect requires functional NMDA receptors and is mediated in part through the activation of the cAMP/PKA cascade. Collectively, our results support the idea that reward-related signaling can act on a pre-established synaptic eligibility trace, thereby associating specific experiences with behaviorally distant, rewarding outcomes. This finding identifies a biologically plausible mechanism for solving the ‘distal reward problem’.DOI: http://dx.doi.org/10.7554/eLife.09685.001
Spike timing-dependent plasticity (STDP) is under neuromodulatory control, which is correlated with distinct behavioral states. Previously, we reported that dopamine, a reward signal, broadens the time window for synaptic potentiation and modulates the outcome of hippocampal STDP even when applied after the plasticity induction protocol (Brzosko et al., 2015). Here, we demonstrate that sequential neuromodulation of STDP by acetylcholine and dopamine offers an efficacious model of reward-based navigation. Specifically, our experimental data in mouse hippocampal slices show that acetylcholine biases STDP toward synaptic depression, whilst subsequent application of dopamine converts this depression into potentiation. Incorporating this bidirectional neuromodulation-enabled correlational synaptic learning rule into a computational model yields effective navigation toward changing reward locations, as in natural foraging behavior. Thus, temporally sequenced neuromodulation of STDP enables associations to be made between actions and outcomes and also provides a possible mechanism for aligning the time scales of cellular and behavioral learning.DOI: http://dx.doi.org/10.7554/eLife.27756.001
Neuromodulation plays a fundamental role in the acquisition of new behaviours. In previous experimental work, we showed that acetylcholine biases hippocampal synaptic plasticity towards depression, and the subsequent application of dopamine can retroactively convert depression into potentiation. We also demonstrated that incorporating this sequentially neuromodulated Spike-Timing-Dependent Plasticity (STDP) rule in a network model of navigation yields effective learning of changing reward locations. Here, we employ computational modelling to further characterize the effects of cholinergic depression on behaviour. We find that acetylcholine, by allowing learning from negative outcomes, enhances exploration over the action space. We show that this results in a variety of effects, depending on the structure of the model, the environment and the task. Interestingly, sequentially neuromodulated STDP also yields flexible learning, surpassing the performance of other reward-modulated plasticity rules.
A fundamental unresolved problem in neuroscience is how the brain associates in memory events that are separated in time. Here we propose that reactivation-induced synaptic plasticity can solve this problem. Previously, we reported that the reinforcement signal dopamine converts hippocampal spike timing-dependent depression into potentiation during continued synaptic activity (Brzosko et al., 2015). Here, we report that postsynaptic bursts in the presence of dopamine produce input-specific LTP in mouse hippocampal synapses 10 minutes after they were primed with coincident pre- and postsynaptic activity (post-before-pre pairing; Δt = -20 ms). This priming activity induces synaptic depression and sets an NMDA receptor-dependent silent eligibility trace which, through the cAMP-PKA cascade, is rapidly converted into protein synthesis-dependent synaptic potentiation, mediated by a signaling pathway distinct from that of conventional LTP. This synaptic learning rule was incorporated into a computational model, and we found that it adds specificity to reinforcement learning by controlling memory allocation and enabling both ‘instructive’ and 'supervised' reinforcement learning. We predicted that this mechanism would make reactivated neurons activate more strongly and carry more spatial information than non-reactivated cells, which was confirmed in freely moving mice performing a reward-based navigation task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.