The rapid development of robotics technology has made people's lives and work more convenient and efficient. The research and simulation of robots combined with reinforcement learning intelligent algorithms have become a hotspot in various fields of robot applications. In view of this, this study is based on deep reinforcement learning convolutional neural networks, combined with point cloud models, proximal strategy optimization algorithms, and flexible action evaluation algorithms. A seal cutting robot based on deep reinforcement learning has been proposed. The final results show that the descent speed of the seal cutting robot with the root mean square difference as the performance standard is about 1% faster than the flexible action evaluation algorithm. About 2% faster than the proximal strategy optimization algorithm. It is about 4% faster than the deep deterministic strategy gradient algorithm. This indicates that the research model has certain advantages in terms of actual accuracy after cutting. The fluctuation of this model is about 10% smaller than the evaluation of flexible actions and about 60% smaller than the gradient of deep deterministic strategies. Therefore, the research model has the highest overall stability without falling into local optima. In addition, compared to the near end strategy optimization algorithm, it falls into local optima, resulting in a low coincidence degree of about 17%. The deep deterministic strategy gradient algorithm has a large fluctuation amplitude during the seal cutting process, and the overall curve is relatively slow, with a final overlap of about 70%. The overlap degree of flexible action evaluation is slightly higher by about 83%. The maximum stability of the model's overlap is best around 90%. Through experiments, it can be found that the seal cutting robot proposed in the study based on deep reinforcement learning maintains certain advantages in performance indicators in various types of tests.