Online activities such as social networking, online shopping, and consuming multi-media create digital traces, which are often analyzed and used to improve user experience and increase revenue, e. g., through better-fitting recommendations and more targeted marketing. Analyses of digital traces typically aim to find user traits such as age, gender, and nationality to derive common preferences. We investigate to which extent the music listening habits of users of the social music platform Last.fm can be used to predict their age, gender, and nationality. We propose a feature modeling approach building on Term Frequency-Inverse Document Frequency (TF-IDF) for artist listening information and artist tags combined with additionally extracted features. We show that we can substantially outperform a baseline majority voting approach and can compete with existing approaches. Further, regarding prediction accuracy vs. available listening data we show that even one single listening event per user is enough to outperform the baseline in all prediction tasks. We also compare the performance of our algorithm for different user groups and discuss possible prediction errors and how to mitigate them. We conclude that personal information can be derived from music listening information, which indeed can help better tailoring recommendations, as we illustrate with the use case of a music recommender system that can directly utilize the user attributes predicted by our algorithm to increase the quality of it's recommendations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.