Convolution-based classification of audio and symbolic representations of music

Velarde, Gissel; Cancino-Chacón, Carlos; Meredith, David; Weyde, Tillman; Grachten, Maarten

doi:10.1080/09298215.2018.1458885

Cited by 13 publications

(7 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These sources were applied for genre recognition [18][19][20][21], mood and emotion recognition [22][23][24][25], artist identification [26], hit song prediction [27], and playlist prediction [28]. Audio and symbolic features were used for genre recognition [29,30]. Audio and images were employed for mood prediction [31] and genre recognition [12].…”

Section: Related Workmentioning

confidence: 99%

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition

Wilkes

Vatolkin

Müller

2021

Entropy

View full text Add to dashboard Cite

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.

show abstract

Section: Related Workmentioning

confidence: 99%

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition

Wilkes

Vatolkin

Müller

2021

Entropy

View full text Add to dashboard Cite

show abstract

“…As discussed before, Velarde et al (2016) attain the highest accuracy prior to our study, but they use a computer vision approach that is difficult to interpret musically: they apply a Gaussian filter to images of piano roll scores, transform the resulting pixel data through linear discriminant analysis, and classify with a linear SVM. In follow-up work, Velarde, Cancino Chacón, Meredith, Weyde, and Grachten (2018) extend their approach to include image analysis of spectrograms, as well as classification with a k-nearest neighbour classifier; however, as before, their study differs in scope from our musicological investigation. Finally, Taminau et al (2010) deploy subgroup discovery, a descriptive rule learning technique that involves both predictive and descriptive induction.…”

Section: Accuracy Comparisons With Previous Studiesmentioning

confidence: 99%

“…We conclude there are significant musical differences between Haydn and Mozart string quartets, enabling less than 15% LOO error and the selection of similar models across folds. (Herlands et al, 2014) CV trials 0.80 3-grams model (Hontanilla et al, 2013) LOO 0.747 LDA + Linear SVM (Velarde et al, 2016) LOO 0.804 KNN + SVM ensemble (Velarde et al, 2018) LOO 0.748 Subgroup discovery (Taminau et al, 2010) LOO 0.730 Bayesian Logistic Regression (ours) LOO 0.8526…”

Section: Estimated Probability Of Composermentioning

confidence: 99%

Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets

Kempfert¹,

Wong²

2018

Preprint

View full text Add to dashboard Cite

For humans and machines, perceiving differences between string quartets by Joseph Haydn and Wolfgang Amadeus Mozart has been a challenging task, because of stylistic and compositional similarities between the composers. Based on the content of music scores, this study identifies and quantifies distinctions between these string quartets using statistical and machine learning techniques. Our approach develops new musically meaningful summary features based on the sonata form structure. Several of these proposed summary features are found to be important for distinguishing between Haydn and Mozart string quartets. Leave-one-out classification accuracy rates exceed 85%, significantly higher than has been attained for this task in prior work. These results indicate there are identifiable, musically insightful differences between string quartets by Haydn versus Mozart, such as in their low accompanying voices, Cello and Viola. Our quantitative approaches can expand the longstanding dialogue surrounding Haydn and Mozart, offering empirical evidence of claims made by musicologists. Our proposed framework, which interweaves musical scholarship with learning algorithms, can be applied to other composer classification tasks and quantitative studies of classical music in general.

show abstract

“…The third approach is to feed the raw data into a neural network model and to learn meaningful feature representations directly from the data. While this approach is not new (e.g., [23]), most recent works on composer classification have adopted this approach by applying a convolutional neural network (CNN) to a piano roll-like representation of the data [24][25][26]. CNN models can be considered the current state-of-the-art in composer classification.…”

Section: Introductionmentioning

confidence: 99%

A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining

Yang

Tsai

2021

Applied Sciences

View full text Add to dashboard Cite

This article studies a composer style classification task based on raw sheet music images. While previous works on composer recognition have relied exclusively on supervised learning, we explore the use of self-supervised pretraining methods that have been recently developed for natural language processing. We first convert sheet music images to sequences of musical words, train a language model on a large set of unlabeled musical “sentences”, initialize a classifier with the pretrained language model weights, and then finetune the classifier on a small set of labeled data. We conduct extensive experiments on International Music Score Library Project (IMSLP) piano data using a range of modern language model architectures. We show that pretraining substantially improves classification performance and that Transformer-based architectures perform best. We also introduce two data augmentation strategies and present evidence that the model learns generalizable and semantically meaningful information.

show abstract

Convolution-based classification of audio and symbolic representations of music

Cited by 13 publications

References 26 publications

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition

Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets

A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining

Contact Info

Product

Resources

About