2022
DOI: 10.36227/techrxiv.21155257
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sound Events Localization and Detection Using Bio-inspired Gammatone Filters and Temporal Convolutional Neural Networks

Abstract: <p>This manuscript addresses the problem of detecting, classifying, and localizing sound sources in an acoustic scene of spatial audio. We propose using bio-inspired Gammatone auditory filters for the acoustic feature extraction stage and a novel deep learning architecture encompassing convolutional, recurrent, and temporal convolutional blocks. Our system exceeded the state-of-the-art metrics on four spatial audio datasets with different levels of acoustical complexity and up to three sound sources over… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…The first dataset comprises four couples of recorded signals, including periodic noisy clapping sounds at positions (0,0), (0,1.5), (1,1.5), and (1,2). Two sensors are located on the ground at positions (0,0) and (2,3). The mean of the background noise is 0.42 (W) with a standard deviation of 5 (kW), and the signal-to-noise ratio ranges between 5 to 8 dB.…”
Section: A Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…The first dataset comprises four couples of recorded signals, including periodic noisy clapping sounds at positions (0,0), (0,1.5), (1,1.5), and (1,2). Two sensors are located on the ground at positions (0,0) and (2,3). The mean of the background noise is 0.42 (W) with a standard deviation of 5 (kW), and the signal-to-noise ratio ranges between 5 to 8 dB.…”
Section: A Datasetmentioning
confidence: 99%
“…[2] exploits the echo state SNN capability synergized with CNN classification methods, resulting in enhanced accuracy. Furthermore, [3] employs Convolutional Recurrent Neural Network (CRNN) methods incorporating Gammatone filtering and frequencybased approaches, yielding promising results. These multifaceted methodologies showcase the evolution of SSL tech-niques, embracing diverse technologies and demonstrating promising outcomes.…”
Section: Introductionmentioning
confidence: 99%
“…It was proved crucial for addressing overlapping sound events. Rosero et al devised a Gammatone-based Sound Events Localization and Detection (G-SELD) system [13], which seeks to enhance the SSLD performance by employing the bio-inspired gammatone auditory filters for the acoustic feature extraction.…”
Section: Related Workmentioning
confidence: 99%