ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682912
|View full text |Cite
|
Sign up to set email alerts
|

Randomly Weighted CNNs for (Music) Audio Classification

Abstract: The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors. Following this idea, we study how non-trained (randomly weighted) convolutional neural networks perform as feature extractors for (music) audio classification tasks. We use features extracted from the embeddings of deep architectures as input to a classifier -with the goal to compare classification accuracies when using different randomly weighted architectures. By following this methodology, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
58
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 79 publications
(58 citation statements)
references
References 39 publications
0
58
0
Order By: Relevance
“…In this section, we describe the deep encoder/decoder architectures we used to explore the impact of increasing the Conv-TasNet encoder/decoder's capacity to represent more complex signal transformations. The core architecture we employ is motivated by recent research in audio classification in which waveform-based architectures built on a deep stack of small filters deliver very competitive results [20][21][22]. This research highlights the potential for these architectures to learn generalized patterns via hierarchically combining small-context representations [20].…”
Section: Deep Encoder / Decodermentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we describe the deep encoder/decoder architectures we used to explore the impact of increasing the Conv-TasNet encoder/decoder's capacity to represent more complex signal transformations. The core architecture we employ is motivated by recent research in audio classification in which waveform-based architectures built on a deep stack of small filters deliver very competitive results [20][21][22]. This research highlights the potential for these architectures to learn generalized patterns via hierarchically combining small-context representations [20].…”
Section: Deep Encoder / Decodermentioning
confidence: 99%
“…The core architecture we employ is motivated by recent research in audio classification in which waveform-based architectures built on a deep stack of small filters deliver very competitive results [20][21][22]. This research highlights the potential for these architectures to learn generalized patterns via hierarchically combining small-context representations [20]. For this reason, we investigate the possibilities of a deep encoder/decoder that is based on a stack of small filters with nonlinear activation functions.…”
Section: Deep Encoder / Decodermentioning
confidence: 99%
“…The highest level of representation is then used for classifying the input signal by means of three fully connected layers. Experimental results on UrbanSound8k dataset, which contains 8,732 environmental sounds from 10 classes, have shown that the proposed approach outperforms other approaches based on 2D representations such as spectrograms (Piczak, 2015a;Pons & Serra, 2018;Salamon & Bello, 2015) by between 11.24% (SB-CNN) and 27.14% (VGG) in terms of mean accuracy. Furthermore, the proposed approach does not require data augmentation or any signal pre-processing for extracting features.…”
Section: Introductionmentioning
confidence: 97%
“…Recent works explore CNN-based approaches given the significant improvements over hand-crafted feature-based methods (Piczak, 2015a;Pons & Serra, 2018;Simonyan & Zisserman, 2014;. However, most of these approaches first convert the audio signal into a 2D representation (spectrogram) and use 2D CNN architectures that were originally designed for object recognition such as AlexNet and VGG (Simonyan & Zisserman, 2014).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation