Maikel Withagen scite author profile

Maikel Withagen

1Publication

16Citation Statements Received

21Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Groningen

Publications

Order By: Most citations

Model-based multi-objective reinforcement learning

Wiering

Withagen

Drugan

2014

View full text Add to dashboard Cite

Abstract-This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multiobjective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem. I. INTRODUCTION Reinforcement learning (RL) [1], [2]enables an autonomous agent to learn from its interactions with a particular environment that emits reward signals to the agent. The objective of the agent is to learn a policy that obtains the highest possible discounted cumulative reward intake. In this paper we consider value-based reinforcement learning, where the agent estimates a value function denoting the future reward intake and uses this value function to select actions. Many valuebased reinforcement learning algorithms have been proposed [2]. These algorithms can be divided into model-free and model-based reinforcement learning algorithms. Model-free methods such as Q-learning [3] update the Q-value function after each interaction with the environment without estimating a model. Model-based RL methods first learn to estimate a model of the environment and then use a dynamic programming algorithm to compute the policy. The advantage of model-based RL methods is that experiences of the agent are used more effectively, leading to faster convergence to optimal policies.Although traditionally reinforcement learning algorithms have been applied solely to single objective decision problems, during the last decade the amount of research on multiobjective problems has considerably increased [4], [5]. In multi-objective reinforcement learning (MORL), the reward function emits a reward vector instead of a single scalar reward, and the goal is to learn all Pareto optimal policies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maikel Withagen

Model-based multi-objective reinforcement learning

Contact Info

Product

Resources

About