2021
DOI: 10.1111/exsy.12804
|View full text |Cite
|
Sign up to set email alerts
|

Environmental sound classification using convolution neural networks with different integrated loss functions

Abstract: The hike in the demand for smart cities has gathered the interest of researchers to work on environmental sound classification. Most researchers' goal is to reach the Bayesian optimal error in the field of audio classification. Nonetheless, it is very baffling to interpret meaning from a three-dimensional audio and this is where different types of spectrograms become effective. Using benchmark spectral features such as mel frequency cepstral coefficients (MFCCs), chromagram, log-mel spectrogram (LM), and so on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 49 publications
0
9
0
Order By: Relevance
“…Garg et al worked with the same database using MFCC along with a CNN and a Long Short-Term Memory (LSTM) with test data accuracies ranging from 77% to 82% [ 50 ]. Das et al complemented the CNN with different integrated loss functions achieving even higher accuracies on the same dataset [ 51 ]. The same author also tested CNNs and LSTMs combined with up to seven different features, including MFCC and diverse Chroma features with state-of-the-art performance [ 52 ].…”
Section: Related Workmentioning
confidence: 99%
“…Garg et al worked with the same database using MFCC along with a CNN and a Long Short-Term Memory (LSTM) with test data accuracies ranging from 77% to 82% [ 50 ]. Das et al complemented the CNN with different integrated loss functions achieving even higher accuracies on the same dataset [ 51 ]. The same author also tested CNNs and LSTMs combined with up to seven different features, including MFCC and diverse Chroma features with state-of-the-art performance [ 52 ].…”
Section: Related Workmentioning
confidence: 99%
“…Like the image classification problem, a CNN is a natural architecture for audio classification. Therefore, researchers explored the use of this architecture, namely: Salamon and Bello [23], by employing a Deep Convolutional Neural Network (DCNN); Das et al [3] explored the use of a CNN model with a specific Additive Angular Margin Loss (AAML) and also explored a CNN combined with stacked features such as Mel Frequency Cepstral Coefficients (MFCC) and Chromagram; Mu et al [6] introduced the Temporal-Frequency Attention-Based Convolutional Neural Network (TFCNN) model, a CNN-based model associated with attention mechanisms, among others.…”
Section: Cnn For Audio Classificationmentioning
confidence: 99%
“…Thus, the scientific community has been developing different computational algorithms to acquire, analyse, and classify urban sounds automatically. Nonetheless, the combination of multiple classes, abnormal noise conditions, and the multiplicity of sound sources are still limitations for efficiently completing the task [1,[3][4][5].…”
Section: Introductionmentioning
confidence: 99%
“…Recently, smart cities are emerging to take advantage of all opportunities that cities can provide to improve the lives of their citizens, such as taking advantage of the sensing architecture spread around the city to create innovative services (Bello et al [2]). Accordingly, one of the main requirements concerns Urban Sound characterization, which encompasses several tasks, such as sound classification and segmentation, and still poses different challenges (Mushtaq and Su [3], Das et al [4]). It is estimated that major cities must handle thousands of co-occurring events, with rapid events that require immediate action passing unnoticed by authorities (Mushtaq and Su [3], Das et al [4]).…”
Section: Introductionmentioning
confidence: 99%
“…Accordingly, one of the main requirements concerns Urban Sound characterization, which encompasses several tasks, such as sound classification and segmentation, and still poses different challenges (Mushtaq and Su [3], Das et al [4]). It is estimated that major cities must handle thousands of co-occurring events, with rapid events that require immediate action passing unnoticed by authorities (Mushtaq and Su [3], Das et al [4]).…”
Section: Introductionmentioning
confidence: 99%