Eiji Uchibe scite author profile

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 × 10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units.

show abstract

Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Tsurumine

Cui

Uchibe³

et al. 2019

Robotics and Autonomous Systems

168

View full text Add to dashboard Cite

Deep learning, reinforcement learning, and world models

et al. 2022

View full text Add to dashboard Cite

Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development

Asada

Uchibe

Hosoda

1999

Artificial Intelligence

112

View full text Add to dashboard Cite

The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction

Doya

Uchibe

2005

Adaptive Behavior

View full text Add to dashboard Cite

The aim of the Cyber Rodent project is to understand the origins of our reward and affective systems by building artificial agents that share the same intrinsic constraints as natural agents: Self-preservation and self-reproduction. A Cyber Rodent is a robot that can search for and recharge from battery packs on the floor and copy its programs to a nearby agent through its infrared communication port. This article reviews our research topics so far, including (1) evolution of neural controllers, (2) learning of foraging and mating behaviors, (3) evolution of learning architectures and meta-parameters, (4) simultaneous learning of multiple agents in a body, and (5) learning and evolution in a self-sustained colony. We discuss our future directions and expected contributions.Keywords reinforcement learning · reward function · self-preservation and self-reproduction · learning and evolution Beyond Reinforcement LearningOur daily behaviors are guided by rewards in multiple ways, such as appetitive, aversive, sexual, and social rewards. What is the origin of such multiple reward systems? This article gives an overview of the Cyber Rodent research project (www.irp.oist.jp/nc/crp) in which we aim to explore the design principles of the reward systems for artificial agents to realize selfpreservation and self-reproduction, and thereby try to understand better the origins of reward systems of biological agents.In the standard framework of reinforcement learning (RL) (Sutton & Barto, 1998), the goal of an agent is to learn a policy (sensory-motor mapping) that maximizes the expected weighted sum of future rewards:(1)where r(t) is a reward and 0 ≤ γ ≤ 1 is a discount factor for future rewards.The RL framework was conceived as a model of animal behavioral learning and it has been extremely helpful in understanding the functions of the basal ganglia and neuromodulators such as dopamine (Schultz, Dayan & Montague, 1997;Doya, 2000Doya, , 2002. However, mere maximization of the reward is not always the primary concern for a biological agent. For example, obtaining foods more than one can digest does not make sense. A person often prefers an option with less expected value but less variance, assuring a better worst-case outcome. Although RL models have been able to capture animal behaviorsunder certain experimental conditions, it requires substantial extension to be a general model of animal behavior and learning.RL has also been applied to artificial agents for the aim of automatically deriving appropriate solutions for a variety of control and optimization problems. In applications of RL to real-world problems, the design of the reward function and the setting of meta-parameters, such as the discount factor, are crucial for successful achievement of the task (Doya, 2002). The reward function often has multiple components, for example,where r main is the reward for achieving the main goal, r sub (s) is a supplementary reward for promoting approaches to the main goal, and r cost (a) is the cost term for penalizing unwanted action...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Eiji Uchibe

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Deep learning, reinforcement learning, and world models

Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development

The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction

Contact Info

Product

Resources

About