Extended Pipeline for Content-Based Feature Engineering in Music Genre Recognition

Raissi, Tina; Tibo, Alessandro; Bientinesi, Paolo

doi:10.1109/icassp.2018.8461807

Cited by 5 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in The SOTA methods GTZAN dataset (%) Bisharad et al [7] 85.36 Bisharad et al [8] 82.00 Raissi et al [42] 91.00 Sugianto et al [45] 71.87 Ashraf et al [3] 87.79 Ng et al [39] (FusionNet) 96.50 Liu et al [30] 93.90 Nanni et al [37] 90.60 Ours (MS-SincResNet) 91.49…”

Section: Ablation Studymentioning

confidence: 99%

MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification

Chang

Chen

Lee

2021

Proceedings of the 2021 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

In this study, we proposed a new end-to-end convolutional neural network, called MS-SincResNet, for music genre classification. MS-SincResNet appends 1D multi-scale SincNet (MS-SincNet) to 2D ResNet as the first convolutional layer in an attempt to jointly learn 1D kernels and 2D kernels during the training stage. First, an input music signal is divided into a number of fixed-duration (3 seconds in this study) music clips, and the raw waveform of each music clip is fed into 1D MS-SincNet filter learning module to obtain three-channel 2D representations. The learned representations carry rich timbral, harmonic, and percussive characteristics comparing with spectrograms, harmonic spectrograms, percussive spectrograms and Mel-spectrograms. ResNet is then used to extract discriminative embeddings from these 2D representations. The spatial pyramid pooling (SPP) module is further used to enhance the feature discriminability, in terms of both time and frequency aspects, to obtain the classification label of each music clip. Finally, the voting strategy is applied to summarize the classification results from all 3-second music clips. In our experimental results, we demonstrate that the proposed MS-SincResNet outperforms the baseline SincNet and many well-known hand-crafted features. Considering individual 2D representation, MS-SincResNet also yields competitive results with the state-of-the-art methods on the GTZAN dataset and the ISMIR2004 dataset. The code is

show abstract

Section: Ablation Studymentioning

confidence: 99%

MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification

Chang

Chen

Lee

2021

Proceedings of the 2021 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…The research presented in this section have a common goal of condensing the dataset by striping out non-essential data points while maintaining the critical information for the NN to learn from. The features extracted from the dataset should be comprehensive, compact, and effective; this means they should represent the music well, require a smaller amount of storage, and require little computation to extract [1,[48][49][50]. If the correct features are not extracted or there is any loss of data while extracting, the machine learning phase will lack the ability to make use of vital information and thus the features chosen will significantly affect the final results and accuracy of the work [40].…”

Section: Music Genre Classification Focused On Engineered Featuresmentioning

confidence: 99%

Neural Network Music Genre Classification

Pelchat

Gelowitz

2020

Can. J. Electr. Comput. Eng.

View full text Add to dashboard Cite

show abstract

“…LSTM) can grasp the prominent long-term dependency based properties, such as recurrent harmonics and music structure contained in the music. These are the possible reasons why deep learning architecture based schemes have achieved tremendous success in various MIR tasks, such as onset detection [6], emotion recognition [7], chord estimation [8], rhythm stimuli recognition [9], source separation [10], music recommendation [11] and auto-tagging [4], [12], [14], [15]. For music classification tasks, CNN and RNN are the two most adopted deep learning architectures.…”

Section: Introductionmentioning

confidence: 99%

Combining CNN and Broad Learning for Music Classification

Tang

Chen

2020

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Music classification has been inspired by the remarkable success of deep learning. To enhance efficiency and ensure high performance at the same time, a hybrid architecture that combines deep learning and Broad Learning (BL) is proposed for music classification tasks. At the feature extraction stage, the Random CNN (RCNN) is adopted to analyze the Mel-spectrogram of the input music sound. Compared with conventional CNN, RCNN has more flexible structure to adapt to the variance contained in different types of music. At the prediction stage, the BL technique is introduced to enhance the prediction accuracy and reduce the training time as well. Experimental results on three benchmark datasets (GTZAN, Ballroom, and Emotion) demonstrate that: i) The proposed scheme achieves higher classification accuracy than the deep learning based one, which combines CNN and LSTM, on all three benchmark datasets. ii) Both RCNN and BL contribute to the performance improvement of the proposed scheme. iii) The introduction of BL also helps to enhance the prediction efficiency of the proposed scheme.

show abstract

Extended Pipeline for Content-Based Feature Engineering in Music Genre Recognition

Cited by 5 publications

References 15 publications

MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification

MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification

Neural Network Music Genre Classification

Combining CNN and Broad Learning for Music Classification

Contact Info

Product

Resources

About