Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-746
|View full text |Cite
|
Sign up to set email alerts
|

Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection

Abstract: Acoustic event detection, the determination of the acoustic event type and the localisation of the event, has been widely applied in many real-world applications. Many works adopt multi-label classification techniques to perform the polyphonic acoustic event detection with a global threshold to detect the active acoustic events. However, the global threshold has to be set manually and is highly dependent on the database being tested. To deal with this, we replaced the fixed threshold method with a frame-wise d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 17 publications
0
13
0
Order By: Relevance
“…In that context, polyphonic Sound Event Detection (SED) refers to the task of detecting overlapping audio events from a defined set of events [1]. This task has been investigated in various works [2,1,3,4] and different kinds of applications that include multimedia indexing [5], context recognition [6] and surveillance [7].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In that context, polyphonic Sound Event Detection (SED) refers to the task of detecting overlapping audio events from a defined set of events [1]. This task has been investigated in various works [2,1,3,4] and different kinds of applications that include multimedia indexing [5], context recognition [6] and surveillance [7].…”
Section: Introductionmentioning
confidence: 99%
“…In monophonic SED, the event type with the highest probability is detected as the final active event. Yet, in polyphonic SED, a threshold is often used to determine if the acoustic events are active or not [3]. However, these post-processing methods remain globally overlooked and not described in details, as many papers focus on model descriptions.…”
Section: Introductionmentioning
confidence: 99%
“…Spectral-temporal properties across classes include impulse-like sounds (e.g., door slam), tonal events (e.g., phone ring) and noiselike events (e.g., printer sound). Many works [9,10,11,12,13] have been carried out to address such challenges. The CLEAR [14] and DCASE [15,16] challenge have attempted to capture the wide range of variations in the design of the AED corpora [17,18].…”
Section: Introductionmentioning
confidence: 99%
“…During testing, a segmented event is recognized under the criteria of maximum posterior probability. Recently, motivated by the successful application of neural networks in speech and image processing, deep neural networks (DNN) [11] [12] and recurrent neural networks (RNN) [13] [14] based approaches have been proposed to deal with the challenging problem of real world polyphonic acoustic event detection.…”
Section: Introductionmentioning
confidence: 99%
“…When the acoustic event detection is performed using the multi-label classification approach, the manually labeled boundaries are converted into frame based training samples corresponding to different acoustic event labels. Usually the frame length varies from 5ms [12] to 100ms [15], which requires the manually labeled boundaries to be accurate when the frame length is short. However, the frame wise labeling accuracy around the event boundaries cannot be always guaranteed due to labelling errors from human annotation, especially when the acoustic events are overlapped, which makes the multi-label classification based acoustic event detection more challenging.…”
Section: Introductionmentioning
confidence: 99%