Music Genre Classification is the problem of associating genre-related labels to digitized music tracks. It has applications in the organization of commercial and personal music collections. Often, music tracks are described as a set of timbre-inspired sound textures. In shallow-learning systems, the total number of sound textures per track is usually too high, and texture downsampling is necessary to make training tractable. Although previous work has solved this by linear downsampling, no extensive work has been done to evaluate how texture selection benefits genre classification in the context of the bag of frames track descriptions. In this paper, we evaluate the impact of frame selection on automatic music genre classification in a bag of frames scenario. We also present a novel texture selector based on K-Means aimed to identify diverse sound textures within each track. We evaluated texture selection in diverse datasets, four different feature sets, as well as its relationship to a univariate feature selection strategy. The results show that frame selection leads to significant improvement over the single vector baseline on datasets consisting of full-length tracks, regardless of the feature set.Results also indicate that the K-Means texture selector achieves significant improvements over the baseline, using fewer textures per track than the commonly used linear downsampling. The results also suggest that texture selection is complementary to the feature selection strategy evaluated. Our qualitative analysis indicates that texture variety within classes benefits model generalization. Our analysis shows that selecting specific audio excerpts can improve classification performance, and it can be done automatically.One possible method to use the sequence of textures in a track consists of summarizing them into a single vector [1]. This method, which we call Full Track Statistics (FTS), results in a compact encoding, but can discard texture information that can be useful for genre classification.Another possibility for such is to use a collection of textures to represent each track. This allows for richer descriptions of the underlying genre tags because the collections comprise distinct sound textures that are present in each musical piece. Many MGC approaches use collections of textures and disregard their time-domain ordering. These approaches can be seen as variants of the Bag of Frames (BoF) [2]. There are many approaches for combining the textures in classification, including statistical modelling [2], dictionary learning and quantization [3,4,5], and independent texture labelling combined by voting procedures [6,7,8,9,10]. Using diverse textures for classification avoids the problem of discarding information, but increase the computational cost for dataset processing and storage.In this work, we argue that selecting specific, typical textures from each track is an effective way to reduce the computational workload in storage and processing, while preserving classification accuracy. For such, we propose t...
Automatic music genre classification is the problem of associating mutually-exclusive labels to audio tracks. This process fosters the organization of collections and facilitates searching and marketing music. One approach for automatic music genre classification is to use diverse vector representations for each track, and then classify them individually. After that, a majority voting system can be used to infer a single label to the whole track. In this work, we evaluated the impact of changing the majority voting system to a meta-classifier. The classification results with the meta-classifier showed statistically significant improvements when related to the majority-voting classifier. This indicates that the higher-level information used by the meta-classifier might be relevant for automatic music genre classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.