2019
DOI: 10.1007/s00034-019-01203-0
|View full text |Cite
|
Sign up to set email alerts
|

Time–Frequency Feature Fusion for Noise Robust Audio Event Classification

Abstract: This paper explores the use of three different two-dimensional time-frequency features for audio event classification with deep neural network back-end classifiers. The evaluations use spectrogram, cochleogram and constant-Q transform-based images for classification of 50 classes of audio events in varying levels of acoustic background noise, revealing interesting performance patterns with respect to noise level, feature image type and classifier. Evidence is obtained that two well-performing features, the spe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
10
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 19 publications
4
10
0
1
Order By: Relevance
“…We conclude from this that the three spectrograms represent sounds in ways that have affinity for certain types of sounds (mirroring a conclusion in [37], albeit on very different types of sound data). It is therefore unsurprising that intelligently combining the three spectrograms into a high level feature vector can achieve significant performance gain over single spectrograms.…”
Section: The Performance Of Each Spectrogram By Classsupporting
confidence: 73%
“…We conclude from this that the three spectrograms represent sounds in ways that have affinity for certain types of sounds (mirroring a conclusion in [37], albeit on very different types of sound data). It is therefore unsurprising that intelligently combining the three spectrograms into a high level feature vector can achieve significant performance gain over single spectrograms.…”
Section: The Performance Of Each Spectrogram By Classsupporting
confidence: 73%
“…better for classes containing vehicular sounds (with the exception of the Metro class). It can be concluded that the three spectrograms represent sounds in ways that have affinity for certain types of sounds (mirroring a conclusion in [119],…”
Section: Experimental Results and Comparisonsupporting
confidence: 58%
“…In addition, gammatone filters, which model the human auditory system, are used for forming the time-frequency representation of audio signals [ 41 , 42 ], called gammatone-spectrogram or cochleagram. Constant- Q transform (CQT) [ 43 ] is another technique for frequency transformation of signal and this is used in time-frequency representation of audio signals [ 44 , 45 ].…”
Section: Literature Reviewmentioning
confidence: 99%
“…As such, it might be possible to improve CNN using various signal representations. A number of strategies have been proposed to combine the learning from multiple representations [ 24 , 45 , 55 , 56 ]. Broadly, the methods can be categorized as early-fusion, mid-fusion, and late-fusion [ 57 , 58 , 59 , 60 ].…”
Section: Literature Reviewmentioning
confidence: 99%