The aim of the Cyber Rodent project is to understand the origins of our reward and affective systems by building artificial agents that share the same intrinsic constraints as natural agents: Self-preservation and self-reproduction. A Cyber Rodent is a robot that can search for and recharge from battery packs on the floor and copy its programs to a nearby agent through its infrared communication port. This article reviews our research topics so far, including (1) evolution of neural controllers, (2) learning of foraging and mating behaviors, (3) evolution of learning architectures and meta-parameters, (4) simultaneous learning of multiple agents in a body, and (5) learning and evolution in a self-sustained colony. We discuss our future directions and expected contributions.Keywords reinforcement learning · reward function · self-preservation and self-reproduction · learning and evolution
Beyond Reinforcement LearningOur daily behaviors are guided by rewards in multiple ways, such as appetitive, aversive, sexual, and social rewards. What is the origin of such multiple reward systems? This article gives an overview of the Cyber Rodent research project (www.irp.oist.jp/nc/crp) in which we aim to explore the design principles of the reward systems for artificial agents to realize selfpreservation and self-reproduction, and thereby try to understand better the origins of reward systems of biological agents.In the standard framework of reinforcement learning (RL) (Sutton & Barto, 1998), the goal of an agent is to learn a policy (sensory-motor mapping) that maximizes the expected weighted sum of future rewards:(1)where r(t) is a reward and 0 ≤ γ ≤ 1 is a discount factor for future rewards.The RL framework was conceived as a model of animal behavioral learning and it has been extremely helpful in understanding the functions of the basal ganglia and neuromodulators such as dopamine (Schultz, Dayan & Montague, 1997;Doya, 2000Doya, , 2002. However, mere maximization of the reward is not always the primary concern for a biological agent. For example, obtaining foods more than one can digest does not make sense. A person often prefers an option with less expected value but less variance, assuring a better worst-case outcome. Although RL models have been able to capture animal behaviorsunder certain experimental conditions, it requires substantial extension to be a general model of animal behavior and learning.RL has also been applied to artificial agents for the aim of automatically deriving appropriate solutions for a variety of control and optimization problems. In applications of RL to real-world problems, the design of the reward function and the setting of meta-parameters, such as the discount factor, are crucial for successful achievement of the task (Doya, 2002). The reward function often has multiple components, for example,where r main is the reward for achieving the main goal, r sub (s) is a supplementary reward for promoting approaches to the main goal, and r cost (a) is the cost term for penalizing unwanted action...