“…The second category, RL with RNN, is to use an RNN as an approximate function. An RNN is to learn Q values or advantage values (Lin 1993;Bakker 2002;Bakker, Zhumatiy, Gruener, and Schmidhuber 2003;Schmidhuber 1991;Ho and Kamel 1994;Onat, Kita, and Nishikawa 1998;Ballini, Soares, and Gomide 2001;Gomez, Schmidhuber, and Miikkulainen 2006). Although these methods can find a good policy in POMDPs, they have a big disadvantage in that they require a long learning time.…”