Shengjia Shao scite author profile

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has shown promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.

show abstract

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

Shao

Luk

2017

View full text Add to dashboard Cite

Abstract-Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally expensive. This paper proposes Customised Pearlmutter Propagation (CPP), a novel hardware architecture that accelerates TRPO on FPGA. We use the Pearlmutter Algorithm to address the key computational bottleneck of TRPO in a hardware efficient manner, avoiding symbolic differentiation with change of variables. Experimental evaluation using robotic locomotion benchmarks demonstrates that the proposed CPP architecture implemented on Stratix-V FPGA can achieve up to 20 times speed-up against 6-threaded Keras deep learning library with Theano backend running on a Core i7-5930K CPU.I. INTRODUCTION Reinforcement Learning (RL) is a branch of machine learning that addresses the sequential decision making problem of how an agent should take actions to maximise the cumulative reward gathered from the environment. In each time step t, the agent observes the state of the environment s t and takes an action a t according to his policy π. The environment receives a t and gives a scalar reward r t to the agent. The environment state s then changes to s t+1 , as it's affected by the action. The agent's task is to maximise his long-term cumulative reward by learning to behave optimally through trial and error.As many real world problems are sequential decision making, RL is useful in various areas. In robotics, state s is the robot's position, velocity, etc.; policy π is the control logic; and action a is the control signal for motors; reward can be given for following the desired trajectory [1]. RL has also been successfully applied to game playing and finance.An important class of RL algorithms are policy gradient methods. Assume we have a differentiable parameterised policy π θ , where θ denotes the policy parameters. Suppose we also have an objective function J(π θ ), such as the expected cumulative reward. Then the policy gradient is ∇ θ J(π θ ). Policy-gradient methods try to maximise J(π θ ) by gradientbased optimisation, i.e. ∆ θ = α∇ θ J(π θ ), where α is the step size. This process leads to an improved π θ for higher reward.Policy gradient methods are iterative algorithms. Each iteration is composed of gradient evaluation and parameter update. Usually at least hundreds of iterations are needed to achieve acceptable performance. To reduce the number of iterations, step size selection is critical. A trivial step size α will lead to

show abstract

MapReduce inspired loop mapping for coarse-grained reconfigurable architecture

Yin

Shao

Liu

et al. 2014

Sci. China Inf. Sci.

View full text Add to dashboard Cite

Our work investigates how to map loops efficiently onto Coarse-Grained Reconfigurable Architecture (CGRA). This paper examines the properties of CGRA and builds MapReduce inspired models for the loop parallelization problem. The proposed model has a more detailed performance metric and a more flexible unrolling scheme that can unroll different loop levels with different factors. A Geometric Programming based approach is proposed to resolve the optimization problem of loop parallelization problem. The proposed approach can find the optimal unrolling factor for each level loop, resulting in better parallelization of loops. Experimental results show that the proposed approach achieved up to 44% performance gain compared to the state-of-the-art loop mapping scheme.

show abstract

Dataflow design for optimal incremental SVM training

Shao

Mencer

Luk

2016

View full text Add to dashboard Cite

Abstract-This paper proposes a new parallel architecture for incremental training of a Support Vector Machine (SVM), which produces an optimal solution based on manipulating the KarushKuhn-Tucker (KKT) conditions. Compared to batch training methods, our approach avoids re-training from scratch when training dataset changes. The proposed architecture is the first to adopt an efficient dataflow organisation. The main novelty is a parametric description of the parallel dataflow architecture, which deploys customisable arithmetic units for dense linear algebraic operations involved in updating the KKT conditions. The proposed architecture targets on-line SVM training applications. Experimental evaluation with real world financial data shows that our architecture implemented on Stratix-V FPGA achieved significant speedup against LIBSVM on Core i7-4770 CPU.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shengjia Shao

Accelerating transfer entropy computation

Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

MapReduce inspired loop mapping for coarse-grained reconfigurable architecture

Dataflow design for optimal incremental SVM training

Contact Info

Product

Resources

About