Middlesex University Research Repository makes the University's research available electronically.Copyright and moral rights to this work are retained by the author and/or other copyright owners unless otherwise stated. The work is supplied on the understanding that any use for commercial gain is strictly forbidden. A copy may be downloaded for personal, non-commercial, research or study without prior permission and without charge.Works, including theses and research projects, may not be reproduced in any format or medium, or extensive quotations taken from them, or their content changed in any way, without first obtaining permission in writing from the copyright holder(s). They may not be sold or exploited commercially in any format or medium without the prior written permission of the copyright holder(s).Full bibliographic details must be given when referring to, or quoting from full items including the author's name, the title of the work, publication details where relevant (place, publisher, date), pagination, and for theses or dissertations the awarding institution, the degree type awarded, and the date of the award.If you believe that any material held in the repository infringes copyright law, please contact the Repository Team at Middlesex University via the following email address:eprints@mdx.ac.ukThe item will be removed from the repository while any claim is being investigated. Abstract-In this paper I investigate methods of applying reinforcement learning to continuous state-and action-space problems without a policy function. I compare the performance of four methods, one of which is the discretisation of the actionspace, and the other three are optimisation techniques applied to finding the greedy action without discretisation. The optimisation methods I apply are gradient descent, Nelder-Mead and Newton's Method. The action selection methods are applied in conjunction with the SARSA algorithm, with a multilayer perceptron utilized for the approximation of the value function. The approaches are applied to two simulated continuous state-and action-space control problems: Cart-Pole and double Cart-Pole. The results are compared both in terms of action selection time and the number of trials required to train on the benchmark problems.
This work describes how genetic programming is applied to evolving controllers for the minimum time swing up and inverted balance tasks of the continuous state and action: limited torque acrobot. The best swing‐up controller is able to swing the acrobot up to a position very close to the inverted ‘handstand’ position in a very short time, shorter than that of Coulom (2004), who applied the same constraints on the applied torque values, and to take only slightly longer than the approach by Lai et al. (2009) where far larger torque values were allowed. The best balance controller is able to balance the acrobot in the inverted position when starting from the balance position for the length of time used in the fitness function in all runs; furthermore, 47 out of 50 of the runs evolve controllers able to maintain the balance position for an extended period, an improvement on the balance controllers generated by Dracopoulos and Nichols (2012), which this paper is extended from. The most successful balance controller is also able to balance the acrobot when starting from a small offset from the balance position for this extended period.
Abstract-Here the Newton's Method direct action selection approach to continuous action-space reinforcement learning is extended to use an eligibility trace. This is then compared to the momentum term approach from the literature in terms of the update equations and also the success rate and number of trials required to train on two variants of the simulated CartPole benchmark problem. The eligibility trace approach achieves a higher success rate with a far wider range of parameter values than the momentum approach and also trains in fewer trials on the Cart-Pole problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.