Robot automatic assembly of weak stiffness parts is difficult due to potential deformation during assembly. The robot manipulation cannot adapt to the dynamic contact changes during the assembly process. A robot assembly skill learning system is designed by combining the compliance control and deep reinforcement, which could acquire a better robot assembly strategy. In this paper, a robot assembly strategy learning method based on variable impedance control is proposed to solve the robot assembly contact tasks. During the assembly process, the quality evaluation is designed based on fuzzy logic, and the impedance parameters in the assembly process are studied with a deep deterministic policy gradient. Finally, the effectiveness of the method is verified using the KUKA iiwa robot in the weak stiffness peg-in-hole assembly. Experimental results show that the robot obtains the robot assembly strategy with variable compliant in the process of weak stiffness peg-in-hole assembly. Compared with the previous methods, the assembly success rate of the proposed method reaches 100%.
Background: As an important part of robot operation, peg-in-hole assembly has problems such as a low degree of automation, a large amount of tasks and low efficiency. It is still a huge challenge for robots to automatically complete assembly tasks because the traditional assembly control policy requires complex analysis of the contact model and it is difficult to build the contact model. The deep reinforcement learning method does not require the establishment of complex contact models, but the long training time and low data utilization efficiency make the training costs very high. Methods: With the aim of addressing the problem of how to accurately obtain the assembly policy and improve the data utilization rate of the robot in the peg-in-hole assembly, we propose the Experience Fusion Proximal Policy Optimization algorithm (EFPPO) based on the Proximal Policy Optimization algorithm (PPO). The algorithm improves the assembly speed and the utilization efficiency of training data by combining force control policy and adding a memory buffer, respectively. Results: We build a single-axis hole assembly system based on the UR5e robotic arm and six-dimensional force sensor in the CoppeliaSim simulation environment to effectively realize the prediction of the assembly environment. Compared with the traditional Deep Deterministic Policy Gradient algorithm (DDPG) and PPO algorithm, the peg-in-hole assembly success rate reaches 100% and the data utilization rate is 125% higher than that of the PPO algorithm. Conclusions: The EFPPO algorithm has a high exploration efficiency. While improving the assembly speed and training speed, the EFPPO algorithm achieves smooth assembly and accurate prediction of the assembly environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.