Heuristic Reinforcement Learning Applied to RoboCup Simulation Agents

Celiberto, Luiz A.; Ribeiro, Carlos H. C.; Costa, Anna Helena Reali; Bianchi, Reinaldo A. C.

doi:10.1007/978-3-540-68847-1_19

Cited by 18 publications

(12 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Heuristic model: it is a deterministic approach for representing the behaviour of a simulated user. Among the most common methods for representing information deterministically are hierarchical patterns [ 46 ] and rule sets [ 47 ]. Heuristic models are simple to create and maintain, and require little effort to modify.…”

Section: Simulated Usersmentioning

confidence: 99%

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

et al. 2021

View full text Add to dashboard Cite

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

show abstract

Section: Simulated Usersmentioning

confidence: 99%

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

et al. 2021

View full text Add to dashboard Cite

show abstract

“…New exploration strategy helps to find out the optimal policy from transition history faster than if we would do it randomly. There were studies [12] related to the introducing heuristic function for multiagent reinforcement learning [13], however, it was able to perform only in deterministic action space. Combining heuristic function and actor-critic algorithm should lead to increasing the speed of algorithm convergence, in case when the optimal policy should be established from the set of previous interactions.…”

Section: Motivationmentioning

confidence: 99%

Actor-Critic Algorithm with Transition Cost Estimation

Sergey

Lee

2016

IJFIS

View full text Add to dashboard Cite

We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

show abstract

“…Bianchi, Ribeiro and Costa (9) investigated the use of a multiagent HARL algorithm in a simplified simulator for the robot soccer domain; Celiberto, Ribeiro, Costa and Bianchi (12) studied the use of the HARL algorithms to speed up learning in the RoboCup 2D Simulation domain. Finally, Martins and Bianchi (13) studied the use of several HARL algorithms in a simulated Robot soccer environment that reproduces the conditions of a real physical robot, the FIRA Simurosot competition league.…”

Section: Heuristic Accelerated Reinforcement Learning and The Haql Almentioning

confidence: 99%

Improving Reinforcement Learning by Using Case Based Heuristics

Bianchi

Ros

Mántaras

2009

Case-Based Reasoning Research and Development

Self Cite

View full text Add to dashboard Cite

Abstract. This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Reinforcement Learning algorithms, combining Case Based Reasoning (CBR) and Reinforcement Learning (RL) techniques. This approach, called Case Based Heuristically Accelerated Reinforcement Learning (CB-HARL), builds upon an emerging technique, the Heuristic Accelerated Reinforcement Learning (HARL), in which RL methods are accelerated by making use of heuristic information. CB-HARL is a subset of RL that makes use of a heuristic function derived from a case base, in a Case Based Reasoning manner. An algorithm that incorporates CBR techniques into the Heuristically Accelerated Q-Learning is also proposed. Empirical evaluations were conducted in a simulator for the RoboCup Four-Legged Soccer Competition, and results obtained shows that using CB-HARL, the agents learn faster than using either RL or HARL methods.

show abstract

Heuristic Reinforcement Learning Applied to RoboCup Simulation Agents

Cited by 18 publications

References 3 publications

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Actor-Critic Algorithm with Transition Cost Estimation

Improving Reinforcement Learning by Using Case Based Heuristics

Contact Info

Product

Resources

About