ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414458
|View full text |Cite
|
Sign up to set email alerts
|

Learning Audio Embeddings with User Listening Data for Content-Based Music Recommendation

Abstract: Personalized recommendation on new track releases has always been a challenging problem in the music industry. To combat this problem, we first explore user listening history and demographics to construct a user embedding representing the user's music preference. With the user embedding and audio data from user's liked and disliked tracks, an audio embedding can be obtained for each track using metric learning with Siamese networks. For a new track, we can decide the best group of users to recommend by computi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

4
6

Authors

Journals

citations
Cited by 28 publications
(14 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…And more sources can be separated via one model. In future work, since audio embeddings have been widely used in other audio tasks such as music recommendation (Chen et al 2021) andmusic generation (2019;2021;, we expect to use these audio embeddings as source queries to see if they can capture different audio features and lead to better separation performance.…”
Section: Discussionmentioning
confidence: 99%
“…And more sources can be separated via one model. In future work, since audio embeddings have been widely used in other audio tasks such as music recommendation (Chen et al 2021) andmusic generation (2019;2021;, we expect to use these audio embeddings as source queries to see if they can capture different audio features and lead to better separation performance.…”
Section: Discussionmentioning
confidence: 99%
“…Several works have utilized VAE models for exploring audio representations, including VAE models for finding disentangled audio representations (Luo, Agres, & Herremans, 2019) and VAE models for modeling audio containing speech (Hsu, Zhang, & Glass, 2017). Apart from VAE, simple convolutional models are used in different music tasks such as music recommendation (Chen, Liang, Ma, & Gu, 2021) and source separation (Chen, Du, et al, 2021). In our experiments for audio representations, we chose to use a convolutional VAE to map the short-time Fourier transform (STFT) representation of audio into a lower dimensional latent representation.…”
Section: Vae For Audiomentioning
confidence: 99%
“…The build procedure of XMusic will be introduced in Section 4.1.1. Inspired by contentbased music recommendation [3,17,18,20], we aim to leverage song content features, including audio, metadata, generated content, etc., to learn high-quality song representations supervised by the song-to-song interactions. However, we notice the imbalance of supervision signals between popular and less popular music, also known as the long tail effect.…”
Section: Introductionmentioning
confidence: 99%