Traditional research into the arts has almost always been based around the subjective judgment of human critics. The use of data mining tools to understand art has great promise as it is objective and operational. We investigate the distribution of music from around the world: geographical ethnomusicology. We cast the problem as training a machine learning program to predict the geographical origin of pieces of music. This is a technically interesting problem as it has features of both classification and regression, and because of the spherical geometry of the surface of the Earth. Because of these characteristics of the representation of geographical positions, most standard classification/regression methods cannot be directly used. Two applicable methods are K-Nearest Neighbors and Random forest regression, which are robust to the non-standard structure of data. We also investigated improving performance through use of bagging. We collected 1,142 pieces of music from 73 countries/areas, and described them using 2 different sets of standard audio descriptors using MARSYAS. 10-fold cross validation was used in all experiments. The experimental results indicate that Random forest regression produces significantly better results than KNN, and the use of bagging improves the performance of KNN. The best performing algorithm achieved a mean great circle distance error of 3,113 km.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.