Comparison of Pre-Trained CNNs for Audio Classification Using Transfer Learning

Tsalera, Eleni; Papadakis, Andreas; Σαμαράκου, Μαρία

doi:10.3390/jsan10040072

Cited by 72 publications

(30 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…VGGish includes a deep audio embedding mode and is the proposed method for classifying audio from YouTube videos. Pre-trained VGGish is often used for audio classification [40] . A characteristic of the model structure is that several feature extractions are performed using the four block structures combined by convolution and max pooling.…”

Section: Methodsmentioning

confidence: 99%

Interpretation of lung disease classification with light attention connected module

Choi

Lee

2023

Biomedical Signal Processing and Control

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Interpretation of lung disease classification with light attention connected module

Choi

Lee

2023

Biomedical Signal Processing and Control

View full text Add to dashboard Cite

“…Similarly, for the low-complexity acoustic scene classification dataset, the leading system uses resnet with a receptive field. For Urbansound8k, different systems are proposed [ 16 , 54 ]. These systems use feature pre-processing and post processing, transfer learning, and other methods to enhance the accuracy of the system.…”

Section: Methodsmentioning

confidence: 99%

“…In classification systems, the Mel filter bank energies are extracted using a fast Fourier transform-based algorithm to generate Mel spectrograms. Whether these systems are trained from the scratch using time–frequency representation of sounds [ 6 , 12 , 13 ] or if transfer learning is used to retrain systems trained on images to perform sound classification [ 5 , 14 , 15 , 16 ], they employ Fourier transform for feature extraction. However, there are some crucial restrictions to performing Fourier spectral analysis, which makes Fourier transform valid under extremely general conditions [ 17 , 18 ].…”

Section: Introductionmentioning

confidence: 99%

Empirical Mode Decomposition-Based Feature Extraction for Environmental Sound Classification

Ahmed

Serrestou

Raoof

et al. 2022

Sensors

View full text Add to dashboard Cite

In environment sound classification logs, Mel band energies (MBEs) are considered as the most successful and commonly used features for classification. The underlying algorithm, fast Fourier transform (FFT), is valid under certain restrictions. In this study, we address these limitations of Fourier transform and propose a new method to extract log Mel band energies using amplitude modulation and frequency modulation. We present a comparative study between traditionally used log Mel band energy features extracted by Fourier transform and log Mel band energy features extracted by our new approach. This approach is based on extracting log Mel band energies from estimation of instantaneous frequency (IF) and instantaneous amplitude (IA), which are used to construct a spectrogram. The estimation of IA and IF is made by associating empirical mode decomposition (EMD) with the Teager–Kaiser energy operator (TKEO) and the discrete energy separation algorithm. Later, a Mel filter bank is applied to the estimated spectrogram to generate EMD-TKEO-based MBEs, or simply, EMD-MBEs. In addition, we employ the EMD method to remove signal trends from the original signal and generate another type of MBE, called S-MBEs, using FFT and a Mel filter bank. Four different datasets were utilised and convolutional neural networks (CNN) were trained using features extracted from Fourier transform-based MBEs (FFT-MBEs), EMD-MBEs, and S-MBEs. In addition, CNNs were trained with an aggregation of all three feature extraction techniques and a combination of FFT-MBEs and EMD-MBEs. Individually, FFT-MBEs achieved higher accuracy compared to EMD-MBEs and S-MBEs. In general, the system trained with the combination of all three features performed slightly better compared to the system trained with the three features separately.

show abstract

“…Leveraging transfer learning with CNNs for audio classification problems is studied in some papers. In [19] , the usage of both image and sound CNNs is studied. In the former case, data samples are image-based sound representations such as spectrograms.…”

Section: Related Workmentioning

confidence: 99%

“…In order to evaluate the effectiveness of the transfer learning approaches in our specific application scenario, we take into account 3 state-of-the-art deep learning models for audio classification: YAMNET, VGGish, and L 3 -Net [34] . These models mainly differ in the network’s architecture and training approach.…”

Section: Acoustic Features and Deep Audio Embeddingsmentioning

confidence: 99%

Transfer learning for the efficient detection of COVID-19 from smartphone audio data

Campana

Delmastro

Pagani

2023

Pervasive and Mobile Computing

View full text Add to dashboard Cite

Comparison of Pre-Trained CNNs for Audio Classification Using Transfer Learning

Cited by 72 publications

References 40 publications

Interpretation of lung disease classification with light attention connected module

Interpretation of lung disease classification with light attention connected module

Empirical Mode Decomposition-Based Feature Extraction for Environmental Sound Classification

Transfer learning for the efficient detection of COVID-19 from smartphone audio data

Contact Info

Product

Resources

About