A new graph-based evolutionary algorithm called "Genetic Network Programming (GNP)" has been proposed. GNP represents its solutions as graph structures which have some distinguished abilities. For example, GNP can memorize past action sequences in the network flow and make compact structures. However, conventional GNP is based on evolution, i.e., after GNP programs are carried out to some extent, they are evaluated and evolved based on their fitness values, so many trials must be executed again and again. Therefore, GNP with Reinforcement Learning (GNP-RL) which combines evolution-based GNP and RL has been proposed. In this method, because RL is done when an agent is carrying out its task, GNP can search for better solutions every judgment and processing during task execution besides the evolutional operation executed after task execution. The aim of combining evolution and RL is to take advantage of the sophisticated diversified search ability of evolution and the intensified search ability of RL with online learning. Moreover, the size of the Q table could be set at small values. Therefore, the calculation time and memory consumption can be saved.In this paper, in order to extend the ability of GNP-RL and apply to real world problems, we aim to deal with numerical information (ex. 15 degrees, 512 points, etc.), while all the algorithms of GNP already proposed deal with discrete information (ex. right, left, etc.). In the simulations, the proposed method is applied to the controller of the Khepera simulator and it learns wall-following behavior.Wall-following behavior problem requires the robot to move along walls and also requires it to move as fast and straight as possible. The rewards and fitness values used for learning and evolution, respectively are calculated based on the above conditions. Table 1 shows the generalization ability of GNP-RL, evolution-based GNP (standard GNP) and Neural Networks evolved by GA (NN-GA). The average fitness is calculated as follows. First, each method produces programs in a training Fig. 1. Typical track of the robot controlled by GNP-RL in the testing environment environment. Then, the average fitness is calculated in a testing environment using the best programs of each method obtained in the training environment. From Table 1, it is clarified that GNP-RL shows the best fitness and there are significant differences between GNP-RL and the other methods. Fig. 1 shows a typical behavior of the robot controlled by GNP-RL. It also shows the speed of the wheels under several situations. From the figure, the proposed method can learn wall-following behavior well. The robot can move straight along the wall and when the wall becomes close to the robot, it can appropriately avoid colliding with the wall by adjusting the speed of the two wheels.-11 - Paper Genetic Network Programming with Reinforcement Learning and ItsApplication to Making Mobile Robot Behavior MemberA new graph-based evolutionary algorithm called Genetic Network Programming, GNP has been proposed. The solutions of...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.