Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named Proximal Policy Optimization Smoothed Algorithm (PPOS), and its critical improvement is the use of a functional clipping method instead of a flat clipping method. We compare our approach with PPO and PPORB, which adopts a rollback clipping method, and prove that our approach can conduct more accurate updates than other PPO methods. We show that it outperforms the latest PPO variants on both performance and stability in challenging continuous control tasks. Moreover, we provide an instructive guideline for tuning the hyperparameter in our algorithm.INDEX TERMS Machine learning, robot control, deep reinforcement learning and policy search algorithm.
Malfunctions on industrial robots can cost factories 22 000 dollars per minute. Although the benefits of a fault-tolerant robot arm are clear, redundant sensors would steeply add to the costs of such robots while machine learning-based methods would spend too much time learning the robot's model. We propose a simple but highly effective method to infer which joint underwent failure and at which angle this joint is constrained and to, then, modify the inverse kinematics (IK) algorithm to adaptively achieve the goal. Our method involves combining the robot arm with a QR code and an inexpensive camera, building a virtual link between these three to give the relative position of the end-effector. Once one joint/encoder/motor suffers damage, we use this virtual link to calibrate this joint by coordinate transformation, to calculate the constrained angle, and to recalculate the trajectory through IK iterations with the Newton-Raphson method. We prove the efficacy of our method with pick-and-place experiments, commonly seen in industrial settings, emulating malfunctions on different joints and at different angles, and our method can successfully finish the task in most cases. We further demonstrate that our method is capable, for almost all of the six-degree-offreedom manipulators, to adapt to joint failures after suffering an actuator failure. With the steep increase of robots within factories, this paper presents an elegant approach to keep robots functional until maintenance is scheduled, reducing downtime.
Abstract-A study on distribution characters of initial stress in the condition of gravity effect in the slopes within alp of which the slope angles poses 30°、 45°、 60° respectively were conducted by numerical analysis. After comparison, it is found that when closing to the region of slope surface, the difference of vertical stress between the value which are calculated by the direct buried depth and the actual value is huge. And the difference becomes huger with the slope becoming steeper. Then , a comparison has been made with excavating a cavern which poses 3 different positions separately at different distance from the slope toe in the condition of 45° slope angle and using the equivalent mechanical parameters of jointed rock mass to make numerical analysis to look the rock stability difference of the three schemes. It is found that the closer to slope surface (slope toe), the larger the plastic zone or the damage zone around the caverns becomes.
Energy efficiency is critical for the locomotion of quadruped robots. However, energy efficiency values found in simulations do not transfer adequately to the real world. To address this issue, we present a novel method, named Policy Search Transfer Optimization (PSTO), which combines deep reinforcement learning and optimization to create energy-efficient locomotion for quadruped robots in the real world. The deep reinforcement learning and policy search process are performed by the TD3 algorithm and the policy is transferred to the open-loop control trajectory further optimized by numerical methods, and conducted on the robot in the real world. In order to ensure the high uniformity of the simulation results and the behavior of the hardware platform, we introduce and validate the accurate model in simulation including consistent size and fine-tuning parameters. We then validate those results with real-world experiments on the quadruped robot Ant by executing dynamic walking gaits with different leg lengths and numbers of amplifications. We analyze the results and show that our methods can outperform the control method provided by the state-of-the-art policy search algorithm TD3 and sinusoid function on both energy efficiency and speed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.