This study fuses multimodal sequence matching with a deep neural network algorithm for college basketball player behavior detection and recognition to conduct in-depth research and analysis, analyzing the basic components of basketball technical action videos by studying the practical application of technical actions in professional games and teaching videos from self-published authors of short video platforms. The characteristics of the dataset are also analyzed through literature research related to the basketball action dataset. On the established basketball technical action dataset, combined with the SSD target detection algorithm, the video images with cropping of human motion regions to reduce the size of image frames, generating a basketball technical action dataset based on cropped frames, reduces the amount of network training and improves the efficiency of subsequent action recognition training. In this study, by analyzing the characteristics of basic camera motion, a univariate global motion model is proposed to introduce a quadratic term to accurately express the shaking transformation, while the horizontal and vertical motion are independently represented to reduce the model complexity. Comparative experimental results show that the proposed model achieves a good balance between complexity and accuracy of global motion representation. It is suitable for global motion modeling in behavior recognition applications, laying the foundation for global and local motion estimation. On this basis, the visual feature change pattern of the key area of the scene (basketball area) is combined with the group behavior recognition based on motion patterns and the success-failure classification based on key visual information to achieve basketball semantic event recognition. The experimental results at NCAA show that the fusion of global and local motion patterns can effectively improve group behavior recognition performance. The semantic event recognition algorithm combining motion patterns and video key visual information achieves the best performance.