2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2011
DOI: 10.1109/iembs.2011.6091370
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning via kernel temporal difference

Abstract: This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(27 citation statements)
references
References 15 publications
0
27
0
Order By: Relevance
“…Typically in RL, temporal difference algorithms approximate the value function using a parametrized family of functions. A nonparametric variant, KTD(λ) [3], [5], is obtained by approximating the value function using a function f ∈ H, where H is a reproducing kernel Hilbert space (RKHS) with reproducing kernel κ(·, ·), such that f (x(t)) = f, φ(x(t)) = f, κ(x(t), ·) , ∀f ∈ H with a mapping function φ(x(t)) : X → H and a positive definite function κ : X × X → R. The KTD(λ) update rule is…”
Section: A Agentmentioning
confidence: 99%
See 2 more Smart Citations
“…Typically in RL, temporal difference algorithms approximate the value function using a parametrized family of functions. A nonparametric variant, KTD(λ) [3], [5], is obtained by approximating the value function using a function f ∈ H, where H is a reproducing kernel Hilbert space (RKHS) with reproducing kernel κ(·, ·), such that f (x(t)) = f, φ(x(t)) = f, κ(x(t), ·) , ∀f ∈ H with a mapping function φ(x(t)) : X → H and a positive definite function κ : X × X → R. The KTD(λ) update rule is…”
Section: A Agentmentioning
confidence: 99%
“…A BMI architecture based on reinforcement learning (RLBMI) is introduced in [1]; successful applications of this approach can be found in [2], [3], [4]. The key idea of RLBMI is co-adaptation between two intelligent systems: the BMI decoder in the agent and the BMI user in the environment ( Figure 1).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this RLBMI framework, an agent learns how to transfer the neural states into actions based on predefined reward values from the environment. The agent must interpret the subject's brain activity correctly to facilitate the rewards [18,19]. A female bonnet macaque is trained for a center-out reaching task allowing 8 action directions.…”
Section: Eight Target Center-out Reaching Taskmentioning
confidence: 99%
“…A BMI architecture based on reinforcement learning (RLBMI) was introduced in [3], and successful applications of this approach can be found in [4], [5]. In the RLBMI structure (Figure 1), the agent learns how to translate the neural states into actions based on reward values from the environment.…”
Section: Introductionmentioning
confidence: 99%