Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Zhu, Boqing; Xu, Kele; Wang, Dezhi; Zhang, Lilun; Li, Bo; Peng, Yuxing

doi:10.1007/978-3-030-00767-6_49

Cited by 23 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The best results were produced when Gammatone cepstral coefficients were combined TEO-GTSC with score-level fusion. A multi-temporal resolution CNN was proposed in [34]. Here, multiple CNNs with different filter sizes and stride lengths work on a raw audio signal on different temporal resolutions, in parallel.…”

Section: Related Workmentioning

confidence: 99%

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

Sharma¹,

Granmo²,

Goodwin³

2019

Preprint

View full text Add to dashboard Cite

In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN). The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. Also, we employ a deeper CNN (DCNN) compared to previous models, consisting of 2D separable convolutions working on time and feature domain separately. The model also consists of max pooling layers that downsample time and feature domain separately. We use some data augmentation techniques to further boost performance. Our model is able to achieve state-ofthe-art performance on all three benchmark environment sound classification datasets, i.e. the UrbanSound8K (97.35%), ESC-10 (95.75%) and ESC-50 (90.48%). To the best of our knowledge, this is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets. For ESC-10 and ESC-50 datasets, the accuracy achieved by the proposed model is beyond human accuracy of 95.7% and 81.3% respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

Sharma¹,

Granmo²,

Goodwin³

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…[12] proposed a timedomain network based on the 1D raw waveform. The results in [13,14] demonstrated that the feature directly learned from the raw waveform is sufficient for audio classification problems. However, these works are designed specially for single-label audio classification.…”

Section: Related Workmentioning

confidence: 99%

An End-to-End Audio Classification System Based on Raw Waveforms and Mix-Training Strategy

Chen¹,

Jing²,

Chen³

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

Audio classification can distinguish different kinds of sounds, which is helpful for intelligent applications in daily life. However, it remains a challenging task since the sound events in an audio clip is probably multiple, even overlapping. This paper introduces an end-to-end audio classification system based on raw waveforms and mix-training strategy. Compared to human-designed features which have been widely used in existing research, raw waveforms contain more complete information and are more appropriate for multi-label classification. Taking raw waveforms as input, our network consists of two variants of ResNet structure which can learn a discriminative representation. To explore the information in intermediate layers, a multi-level prediction with attention structure is applied in our model. Furthermore, we design a mix-training strategy to break the performance limitation caused by the amount of training data. Experiments show that the mean average precision of the proposed audio classification system on Audio Set dataset is 37.2%. Without using extra training data, our system exceeds the state-of-the-art multi-level attention model.

show abstract

“…Considering the sound processing field, few studies have been conducted in this field, especially classifying and grading fluids sounds, except the analysis and recognition of human voices, which is a highly active area and works such as speech [34][35][36][37][38][39][40] recognition have been performed in this regard. However, studies have been recently conducted on the classification of fluids sounds in other areas, including human heartbeat [41], urban sounds [42][43][44][45][46][47][48] play sounds, car horns, air conditioning sound, engine sounds, etc. ), and music [49][50][51][52][53][54][55][56], using deep learning.…”

Section: Fig1 Venn Diagram Of Artificial Intelligencementioning

confidence: 99%

Fluid Temperature Detection Based on its Sound with a Deep Learning Approach

Yazdani¹,

Mehr²,

Showkatyan³

et al. 2021

IJIGSP

View full text Add to dashboard Cite

The present study, the main idea of which was based on one of the questions of I.P.T.2018 competition, aimed to develop a high-precision relationship between the fluid temperature and the sound produced when colliding with different surfaces, by creating a data collection tool. In fact, this paper was provided based on a traditional phenomenological project using the well-known deep neural networks, in order to achieve an acceptable accuracy in this project. In order to improve the quality of the paper, the data were analyzed in two ways: I. Using the images of data spectrogram and the known V.G.G.16 network. II. Applying the data audio signal and a convolutional neural network (C.N.N.). Finally, both methods have obtained an acceptable precision above 85%.

show abstract

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Cited by 23 publications

References 24 publications

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

An End-to-End Audio Classification System Based on Raw Waveforms and Mix-Training Strategy

Fluid Temperature Detection Based on its Sound with a Deep Learning Approach

Contact Info

Product

Resources

About