“…Free from the limitation of manually extracting features, the grasping algorithms based on deep learning have achieved insurmountable effects in all aspects by traditional approaches, taking the robot's intelligence to a higher level. Specifically, with RGB images or depth images as input, robotic grasping based on convolutional neural network (CNN) which is a dominant deep learning framework in the field of computer vision, has obtained high grasping success rates in many tasks (Lenz et al, 2015 ; Varley et al, 2015 ; Johns et al, 2016 ; Finn and Levine, 2017 ; James et al, 2017 ; Kumra and Kanan, 2017 ; Zhang et al, 2017 ; Dyrstad et al, 2018 ; Levine et al, 2018 ; Schmidt et al, 2018 ; Schwarz et al, 2018 ). As shown in Figure 1 , nowadays, based on visual information, robot dexterous grasp learning can be roughly divided into two categories based on whether the learning process is based on trial and error.…”