Rhythm perception is becoming more and more important in the field of music information processing and music understanding. The study first adopts signal processing methods to extract musical features, then uses feature fusion techniques to integrate features of different modalities into a single feature vector. Based on this model, the study identifies the rhythmic activation function of music and combines it with the hidden Markov model to infer the rhythm of the music. One of the key points of the study is to perform rhythm recognition on music containing drums, to explore the recognition effect. One of the focuses of the study is to recognize the rhythm of music containing drums to explore the recognition effect.In addition, the study also analyzes the Softmax output values of the music and compares the recognition effect of different models. The results show that the rhythm recognition of music using the multimodal deep learning method performs the best in terms of the F-Measure value, the Cemgil value, the Goto value, and the P-score value, with the respective 65.65%, 66.76%, 36.75%, and 36.75%. 66.76%, 36.75%, and 75.68%.Especially in the drum music recognition, the position of each drum music is accurately recognized, proving the model’s effectiveness in this paper. The research provides a new feasible method for the recognition and understanding of music rhythms and a valuable reference for the research in this field.