2013
DOI: 10.1371/journal.pcbi.1003024
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Abstract: Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

8
238
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 174 publications
(246 citation statements)
references
References 58 publications
8
238
0
Order By: Relevance
“…More recently, extending previous work of Doya (2000) on continuous time TD learning, Frémaux et al (2013) successfully implemented a TD error signal over continuous time spiking representations of RL states, actions and value functions, relying on a sparse topographic (i.e., place cell like) encoding of states with narrow tuning curves and rate encoded value functions to solve navigation, acrobot and cartpole problems. However, although in some modalities the brain can rapidly develop a strong localized response to particular stimuli (Moser et al, 2008), this is unlikely to be a universal feature of sensory representation and it does not suggest an efficient way of integrating stimuli across modalities.…”
Section: Introductionmentioning
confidence: 87%
“…More recently, extending previous work of Doya (2000) on continuous time TD learning, Frémaux et al (2013) successfully implemented a TD error signal over continuous time spiking representations of RL states, actions and value functions, relying on a sparse topographic (i.e., place cell like) encoding of states with narrow tuning curves and rate encoded value functions to solve navigation, acrobot and cartpole problems. However, although in some modalities the brain can rapidly develop a strong localized response to particular stimuli (Moser et al, 2008), this is unlikely to be a universal feature of sensory representation and it does not suggest an efficient way of integrating stimuli across modalities.…”
Section: Introductionmentioning
confidence: 87%
“…The continuous actor-critic reinforcement learning scheme is particularly suited for complex continuous state-action problems while at the same time being based on a biological learning model [3]. The basic learning model can be divided into two sub-mechanisms popularly termed as the actor and the adaptive critic (Fig.…”
Section: A Actor-critic Learning With Dynamic Reservoirmentioning
confidence: 99%
“…The biological detail of our model is higher than that of previously published neural models that reproduce a similar reaching task: we implement a spiking neuron model with different synaptic receptors and many biological features, versus, for example, rate models [28]; we have cortical-based recurrent circuits with different cell types, versus more artificial task-oriented circuitries [7, 35, 36]; and we model anatomical and biophysical musculoskeletal arm properties, as opposed to simpler kinematic arm models [28, 35, 36]. Nonetheless, these models include regions that we do not explicitly implement, such as a population to encode reward information [35], posterior parietal cortex for sensory integration [28], or a cerebellum [36, 37].…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, these models include regions that we do not explicitly implement, such as a population to encode reward information [35], posterior parietal cortex for sensory integration [28], or a cerebellum [36, 37]. …”
Section: Introductionmentioning
confidence: 99%