Music content has recently been identified as useful information to promote the performance of music recommendations. Existing studies usually feed low-level audio features, such as the Mel-frequency cepstral coefficients, into deep learning models for music recommendations. However, such features cannot well characterize music audios, which often contain multiple sound sources. In this paper, we propose to model and fuse chord, melody, and rhythm features to meaningfully characterize the music so as to improve the music recommendation. Specially, we use two user-based attention mechanisms to differentiate the importance of different parts of audio features and chord features. In addition, a Long Short-Term Memory layer is used to capture the sequence characteristics. Those features are fused by a multilayer perceptron and then used to make recommendations. We conducted experiments with a subset of the last.fm-1b dataset. The experimental results show that our proposal outperforms the best baseline by [Formula: see text] on HR@10.
Recently, efforts have been made to explore introducing music content into deep learning-based music recommendation systems. In previous research, with reference to tasks such as speech recognition, music content is often fed into recommendation models as low-level audio features, such as the Mel-frequency cepstral coefficients. However, unlike tasks such as speech recognition, the audio of music often contains multiple sound sources. Hence, low-level time-domain-based or frequencydomain-based audio features may not represent the music content properly, limiting the recommendation algorithm's performance.To address this problem, we propose a music recommendation model based on chord progressions and attention mechanisms. In this model, music content is represented as chord progressions rather than low-level audio features. The model integrates user song interactions and chord sequences of music and uses an attention mechanism to differentiate the importance of different parts of the song. In this model, to make better use of the historical behavioral information of users, we refer to the design of the neural collaborative filtering algorithm to obtain embedding of users and songs. Under this basis, we designed a chord attention layer to mine users' fine-grained preferences for different parts of the music content. We conducted experiments with a subset of the last.fm-1b dataset. The experimental results demonstrate the effectiveness of the method proposed in this paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.