Reinforcement learning via kernel temporal difference

2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Pohlmeyer

et al. 2013

Abstract-This paper presents the first attempt to quantify the individual performance of the subject and of the computer agent on a closed loop Reinforcement Learning Brain Machine Interface (RLBMI). The distinctive feature of the RLBMI architecture is the co-adaptation of two systems (a BMI decoder in agent and a BMI user in environment). In this work, an agent implemented using Q-learning via kernel temporal difference (KTD)(λ) decodes the neural states of a monkey and transforms them into action directions of a robotic arm. We analyze how each participant influences the overall performance both in successful and missed trials by visualizing states, corresponding action value Q, and resulting actions in two-dimensional space. With the proposed methodology, we can observe how the decoder effectively learns a good state to action mapping, and how neural states affect the prediction performance.

Section: A Agentmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A new method of concurrently visualizing states, values, and actions in reinforcement based brain machine interfaces

2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Pohlmeyer

et al. 2013

“…In this RLBMI framework, an agent learns how to transfer the neural states into actions based on predefined reward values from the environment. The agent must interpret the subject's brain activity correctly to facilitate the rewards [18,19]. A female bonnet macaque is trained for a center-out reaching task allowing 8 action directions.…”

Section: Eight Target Center-out Reaching Taskmentioning

confidence: 99%

Stochastic kernel temporal difference for reinforcement learning

2011 IEEE International Workshop on Machine Learning for Signal Processing

Chhatbar

et al. 2011

This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.

“…A BMI architecture based on reinforcement learning (RLBMI) was introduced in [3], and successful applications of this approach can be found in [4], [5]. In the RLBMI structure (Figure 1), the agent learns how to translate the neural states into actions based on reward values from the environment.…”

Section: Introductionmentioning

confidence: 99%

Correntropy kernel temporal differences for reinforcement learning brain machine interfaces

2014 International Joint Conference on Neural Networks (IJCNN)

Prı́ncipe

et al. 2014

Abstract-This paper introduces a novel temporal difference algorithm to estimate a value function in reinforcement learning. This is a kernel adaptive system using a robust cost function called correntropy. We call this system correntropy kernel temporal difference (CKTD). This algorithm integrates Q-learning with CKTD to find a proper policy introducing Qlearning via CKTD. This method was tested with a synthetic problem and its robustness under a changing policy was quantified. The same algorithm was applied to the decoding of a monkey's neural state in a reinforcement learning BMI (RLBMI) in a center-out reaching task. The results show the potential advantage of the proposed algorithm in the RLBMI framework.