As an important branch in the field of computer vision, the exploration of video understanding is still ongoing, and its important part of video understanding has become one of the research hot spots. Firstly, aiming at the problems of large amount of convolution network parameters and model degradation caused by the deepening of network layers, DDensenet network is proposed, and dynamic grouping convolution module (DGC) is adopted, which greatly reduces the amount of parameters, improves the calculation efficiency and alleviates the problem of gradient disappearance. Secondly, the improved convolutional attention module (S-CBMA) is added, and the adaptive module is added to its spatial attention mechanism, which can distribute the weight of features and weaken the background at the same time. DDensenet network model achieves 95.8% and 79.8% average classification accuracy on Sports8 data set and Olympic16 data set respectively. The results show that the model has faster convergence speed and higher classification performance than C3D, ResNet3d, Two-Stream and other network models.