The purpose is to solve the problems of solid subjectivity in the traditional manual analysis of volleyball game videos and the traditional Human Behavior Recognition (HBR) algorithm, such as excessive calculation, high hardware conditions, and poor long-stream video modeling ability. Firstly, this paper expounds on the relevant theories. Secondly, a fusion Convolutional Neural Network (CNN)-based HBR model is implemented by combining two-stream CNN (TSCNN), Three-Dimensional (3D) CNN, and Long Short-Term Memory (LSTM) network. Notably, the LSTM has an excellent long-term Dynamic Information Extraction (DIE) ability. Finally, the public dataset is selected to verify the model's volleyball-game-videooriented HBR performance. The experimental results corroborate that: (1) The parameters of the proposed fusion-CNN-based HBR model are determined as follows: the number of video segments is three, the average method is used for feature fusion, and then the HBR accuracy is the highest when the fusion ratio of spatial feature map and temporal feature map is 4:6, and the learning rate is 0.0014. (2) The average HBR accuracy of the proposed fusion-CNN-based HBR model on three different datasets is 4%, 2.7%, and 3% higher than that of other popular networks, respectively. The improvement effect of the model is remarkable, and it is suitable for the study of Human Behavior Analysis (HBA) in volleyball match videos. Finally, the proposed HBR model can provide more accurate results for volleyball video-based HBR, which is significant in promoting the rapid development of volleyball sports.