Encyclopedia of Computational Neuroscience 2014
DOI: 10.1007/978-1-4614-7320-6_580-2
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning in Cortical Networks

Abstract: SynonymsReward-based learning; Trial-and-error learning; Temporal-Difference (TD) learning; Policy gradient methods DefinitionReinforcement learning represents a basic paradigm of learning in artificial intelligence and biology. The paradigm considers an agent (robot, human, animal) that acts in a typically stochastic environment and receives rewards when reaching certain states. The agent's goal is to maximize the expected reward by choosing the optimal action at any given state. In a cortical implementation,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…The weights are adapted to maximize the reward. In a recent review of reinforcement learning in cortical networks, Senn and Pfister ( 2014 ) generalize the weight update rule to follow (10), with R being the reward and PI a plasticity induction based on pre- and postsynaptic activity. The hypothesis, that synaptic plasticity is driven by the covariance between reward and neural activity was initially introduced by Loewenstein and Seung ( 2006 ).…”
Section: Discussionmentioning
confidence: 99%
“…The weights are adapted to maximize the reward. In a recent review of reinforcement learning in cortical networks, Senn and Pfister ( 2014 ) generalize the weight update rule to follow (10), with R being the reward and PI a plasticity induction based on pre- and postsynaptic activity. The hypothesis, that synaptic plasticity is driven by the covariance between reward and neural activity was initially introduced by Loewenstein and Seung ( 2006 ).…”
Section: Discussionmentioning
confidence: 99%
“…The ability of neurons to adjust the strength of the connection (synapses) between two and more neurons is defined as neuroplasticity. [95] The biological process that is responsible for this ability is known as spike-timing-dependent plasticity [57,96,97] (STDP) and is schematically depicted in Figure 5. STDP describes that the change Δw of the weighting factor between two neurons (A) and (B) is a function of the time difference between the pre-and postsynaptic stimulation.…”
Section: Signal Transmission and Synapsesmentioning
confidence: 99%
“…From a neural perspective, reinforcement learning adds a third factor to the learning process besides presynaptic and postsynaptic spike times [15]. In general, the synaptic efficacy change Δw can therefore be expressed as follows (from [64]):…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…This allows for mathematical deduction and analysis of new learning rules based on Policy Gradient methods and Temporal Difference learning. Algorithms of the former type adapt synaptic weights by computing the gradient of a function which estimates the expected reward [64]. In [14], a policy gradient algorithm for reinforcement learning in partially observable Markov decision processes is employed to derive a reinforcement learning rule for spiking neurons.…”
Section: Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation