Moving target recognition and analysis is an important research direction in the field of computer vision, which is widely used in our life, such as intelligent robot, video surveillance, medical education, sports competition, and national defense security. By analyzing the video of weightlifting, this paper extracts the key postures of athletes’ training, so as to assist coaches to train athletes more professionally. Based on DL (Deep Learning), a key pose extraction method of sports video (RoI_KP for short) based on classified learning of regions of interest is proposed. By fine-tuning CNN (Convolutional Neural Network), a network model suitable for video classification of weightlifting in the region of interest is obtained. Finally, according to the classification results, the selection strategy of classification results is designed to extract key poses. According to the characteristics of different modal information, different DNN (deep neural network) is adopted, and various depth networks are combined to mine the multi-modal spatio-temporal depth features of human movements in video. Experimental results show that the method proposed in this paper is very competitive.