Proceedings of the 2021 International Conference on Multimedia Retrieval 2021
DOI: 10.1145/3460426.3463619
|View full text |Cite
|
Sign up to set email alerts
|

MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification

Abstract: In this study, we proposed a new end-to-end convolutional neural network, called MS-SincResNet, for music genre classification. MS-SincResNet appends 1D multi-scale SincNet (MS-SincNet) to 2D ResNet as the first convolutional layer in an attempt to jointly learn 1D kernels and 2D kernels during the training stage. First, an input music signal is divided into a number of fixed-duration (3 seconds in this study) music clips, and the raw waveform of each music clip is fed into 1D MS-SincNet filter learning module… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 45 publications
0
4
0
Order By: Relevance
“…It shows that BST achieves much better classification performance on GTZAN than its predecessors. [6] 93.9 MS-SincResNet [8] 91.5 S3T [9] 81.1 AST with IPET [10] 90.8 PIPMN [11] 93.2 BST 99.0…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…It shows that BST achieves much better classification performance on GTZAN than its predecessors. [6] 93.9 MS-SincResNet [8] 91.5 S3T [9] 81.1 AST with IPET [10] 90.8 PIPMN [11] 93.2 BST 99.0…”
Section: Resultsmentioning
confidence: 99%
“…Secondly, the number of audios in GTZAN is small, while the length of each audio is long and the feature information is quite rich, so it's easy to result in overfitting. In addition, the MS-SincNet structure proposed by Chang et al can extract features with rich timbres, harmonics and percussions from audios [8], which is more effective than hand-crafted features such as spectrogram, Mel spectrogram, percussion spectrogram and harmonic spectrogram. Thus, this paper's preprocessing method for extracting audio features by Mel-spectrogram needs to be improved.…”
Section: Limitations and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The SincNet filters used in this architecture are the inverse Fourier transform of some rectangular band-pass filters, inspired by band-pass filters in the field of signal processing. Chang et al [30] extended this concept by designing a multi-scale SincNet (MS-SincNet) 2D representation extraction network based on the SincNet filter. The MS-SincNet network autonomously learns filter parameters and exhibits a degree of anti-noise ability.…”
Section: Acoustic Featuresmentioning
confidence: 99%