In recent years, with the explosive growth of research in the field of computer vision, TT has an important application in visual recognition tasks. It can assist target detection and improve the speed of recognition, which has certain theoretical value and research significance. However, in practical application scenarios, target tracking (TT) still faces the problems of inaccurate tracking, poor robustness, and low overall system speed caused by scene changes. Since the TT algorithm based on convolutional neural network (CNN) was proposed, it has attracted the attention of a large number of researchers with the advantages of both speed and accuracy. The multi-layer convolution features extracted from the input image through the CNN have a good appearance representation ability for the target under a variety of complex interference factors. Therefore, this paper uses the CNN to build a feature TT algorithm model, and experiments prove that the accuracy of the model can be improved by increasing the order of magnitude and deepening the ResNet backbone network design. After comparing the tracking effects of the CNN algorithm and other algorithms, it shows that the CNN algorithm has the highest tracking accuracy and success rate.