Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization

Barsce, Juan Cruz; Palombarini, Jorge A.; Martínez, Ernesto

doi:10.19153/cleiej.21.2.1

Cited by 11 publications

(12 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other studies, on the other hand, focused on resorting to Bayesian optimization to optimize several RL hyper-parameters at the same time. In this setting, a preceding work is Barsce et al (2017) 44 , where a Bayesian optimization framework was proposed to optimize RL hyper-parameters. However, in such work, Bayesian optimization and the RL algorithm are decoupled in such a way that the meta-learning does not make specific assumptions with respect to an RL algorithm, making the method also inefficient because the learned tuples of experience ( , , , ′ ) are all aggregated in the metric selected for the objective function and, therefore, cannot be used directly to improve the meta-learning layer.…”

Section: Related Work In Hyper-parameter Optimizationmentioning

confidence: 99%

“…The reasoning behind integrating Bayesian optimization with an RL algorithm is that using a black-box approach is well suited for an expensive optimization task such as RL, by making the most of past queries in order to maximize the gain of selecting the next query of hyper-parameter setting through the acquisition function. As it employs Bayesian optimization for optimizing RL algorithms, this architecture has its foundations in RLOpt 44 .…”

Section: Bayesian Optimization Of An Rl Agentmentioning

confidence: 99%

See 1 more Smart Citation

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Barsce¹,

Palombarini²,

Martínez³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Optimal setting of several hyper-parameters in machine learning algorithms is key to make the most of available data. To this aim, several methods such as evolutionary strategies, random search, Bayesian optimization and heuristic rules of thumb have been proposed. In reinforcement learning (RL), the information content of data gathered by the learning agent while interacting with its environment is heavily dependent on the setting of many hyper-parameters. Therefore, the user of an RL algorithm has to rely on search-based optimization methods, such as grid search or the Nelder-Mead simplex algorithm, that are very inefficient for most RL tasks, slows down significantly the learning curve and leaves to the user the burden of purposefully biasing data gathering. In this work, in order to make an RL algorithm more user-independent, a novel approach for autonomous hyper-parameter setting using Bayesian optimization is proposed. Data from past episodes and different hyper-parameter values are used at a meta-learning level by performing behavioral cloning which helps improving the effectiveness in maximizing a reinforcement learning variant of an acquisition function. Also, by tightly integrating Bayesian optimization in a reinforcement learning agent design, the number of state transitions needed to converge to the optimal policy for a given task is reduced. Computational experiments reveal promising results compared to other manual tweaking and optimization-based approaches which highlights the benefits of changing the algorithm hyper-parameters to increase the information content of generated data.

show abstract

Section: Related Work In Hyper-parameter Optimizationmentioning

confidence: 99%

Section: Bayesian Optimization Of An Rl Agentmentioning

confidence: 99%

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Barsce¹,

Palombarini²,

Martínez³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In RL, an agent learns from rewards and penalties in interacting with an environment [68]. One of the main topics of investigation in RL is the estimation of learning parameters, like learning rate (α) and discount factor (γ ), -greedy and reinforcement function [6,17,23,24,40,54,63]. In fact, parameter definition can directly influence a good route learning [5,12,52,54].…”

Section: Introductionmentioning

confidence: 99%

Reinforcement learning for the traveling salesman problem with refueling

Ottoni

Nepomuceno

Oliveira

et al. 2021

Complex Intell. Syst.

View full text Add to dashboard Cite

The traveling salesman problem (TSP) is one of the best-known combinatorial optimization problems. Many methods derived from TSP have been applied to study autonomous vehicle route planning with fuel constraints. Nevertheless, less attention has been paid to reinforcement learning (RL) as a potential method to solve refueling problems. This paper employs RL to solve the traveling salesman problem With refueling (TSPWR). The technique proposes a model (actions, states, reinforcements) and RL-TSPWR algorithm. Focus is given on the analysis of RL parameters and on the refueling influence in route learning optimization of fuel cost. Two RL algorithms: Q-learning and SARSA are compared. In addition, RL parameter estimation is performed by Response Surface Methodology, Analysis of Variance and Tukey Test. The proposed method achieves the best solution in 15 out of 16 case studies.

show abstract

“…Automated hyperparameter selection is a very similar problem to meta-learning since it often uses a higher level learning procedure to "train" the hyperparameters of the lower level algorithm. These automated methods use a variety of intelligent approaches such as evolutionary computation (Schweighofer and Doya 2003;Young et al 2015) and Bayesian optimisation methods (Barsce et al 2017;Bergstra et al 2011).…”

Section: Introductionmentioning

confidence: 99%

Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning

Wilson

Riccardi

2021

Optim Eng

View full text Add to dashboard Cite

Reinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties—as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations—using a ‘shaped’ state representation to guide the agent and also a ‘raw’ state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to ‘reward-hacking’ and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.

show abstract

Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization

Cited by 11 publications

References 25 publications

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Reinforcement learning for the traveling salesman problem with refueling

Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning

Contact Info

Product

Resources

About