Reinforcement learning using a recurrent neural network

Ho, F.; Kamel, Mohamed S.

doi:10.1109/icnn.1994.374202

Cited by 6 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A number of researchers use a RNN to predict Q values to solve POMDPs (Lin 1993;Bakker 2002;Bakker et al 2003;Schmidhuber 1991;Ho and Kamel 1994;Onat, Kita, and Nishikawa 1998;Ballini et al 2001;Gomez et al 2006). The number of input units is equal to the dimension of the sensory inputs from the environment.…”

Section: Rl With Rnnmentioning

confidence: 99%

“…The second category, RL with RNN, is to use an RNN as an approximate function. An RNN is to learn Q values or advantage values (Lin 1993;Bakker 2002;Bakker, Zhumatiy, Gruener, and Schmidhuber 2003;Schmidhuber 1991;Ho and Kamel 1994;Onat, Kita, and Nishikawa 1998;Ballini, Soares, and Gomide 2001;Gomez, Schmidhuber, and Miikkulainen 2006). Although these methods can find a good policy in POMDPs, they have a big disadvantage in that they require a long learning time.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning for Pomdp Using State Classification

Dung

Komeda

Takagi

2008

Applied Artificial Intelligence

View full text Add to dashboard Cite

& Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.

show abstract

Section: Rl With Rnnmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning for Pomdp Using State Classification

Dung

Komeda

Takagi

2008

Applied Artificial Intelligence

View full text Add to dashboard Cite

show abstract

Control of a Water Tank System with Value Function Approximation

Lalvani

Katsaggelos

2023

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Mixed Reinforcement Learning for Partially Observable Markov Decision Process

Dung

Komeda

Takagi

2007

2007 International Symposium on Computational Intelligence in Robotics and Automation

View full text Add to dashboard Cite

solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

show abstract

Reinforcement learning using a recurrent neural network

Cited by 6 publications

References 6 publications

Reinforcement Learning for Pomdp Using State Classification

Reinforcement Learning for Pomdp Using State Classification

Control of a Water Tank System with Value Function Approximation

Mixed Reinforcement Learning for Partially Observable Markov Decision Process

Contact Info

Product

Resources

About