In the vast river of art and music, this wonderful flower of art has inherited culture through its expansion in horizontal space and continuation in vertical time since its birth. This research mainly discusses the filming of multimedia music images that integrates hidden Markov model (HMM) and song feature tags. Hidden Markov model is a statistical model based on hidden Markov process, which is mainly used for modeling stochastic processes. The editing structure, editing style, and the time and space performance of the editing should be considered in the editing process. When filming multimedia music images in the study of music history, there are not only the shooting pictures of the main body of the musical things but also the introduction of the environmental information generated by the musical things. Editing needs to consider whether each shot and each picture meets the actual research situation for the musical things to be described. It uses HMM to predict the song category based on the user's listening behavior records and song characteristics and then recommends the corresponding category of music to the user in real-time. Finally, the recommended results of several algorithms are verified on the experimental data set. It compares the difference between the traditional music algorithm and the HMM from different dimensions, such as accuracy, error rate, and recommended time efficiency. The accuracy of HMM for music is predicted to be 95%. The multinote recognition system of the piano uses the characteristics of the hidden Markov double random process to describe the statistical characteristics of the audio stream of notes. The proposed research will help promote the dissemination of music image information.