ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414450
|View full text |Cite
|
Sign up to set email alerts
|

Self-Training for Sound Event Detection in Audio Mixtures

Abstract: Sound event detection (SED) takes on the task of identifying presence of specific sound events in a complex audio recording. SED has tremendous implications in video analytics, smart speaker algorithms and audio tagging. Recent advances in deep learning have afforded remarkable advances in performance of SED systems; albeit at the cost of extensive labeling efforts to train supervised methods using fully described sound class labels and timestamps. In order to address limitations in availability of training da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…In a previous work, a probabilistic expectation of potential labels was proposed as a pseudo label for unlabeled data [13]. In order to mitigate the issue of erroneous label mapping shown in Fig.…”
Section: Motivation Of Proposed Semi-supervised Training Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…In a previous work, a probabilistic expectation of potential labels was proposed as a pseudo label for unlabeled data [13]. In order to mitigate the issue of erroneous label mapping shown in Fig.…”
Section: Motivation Of Proposed Semi-supervised Training Methodsmentioning
confidence: 99%
“…In order to reduce a computational load in pseudo label estimation, in this study, the number of concurrent events, k, was considered up to 2 (K = 2) so that total 56 potential labels (= 1 + 10 + 45 for k = 0, k = 1, and k = 2, respectively) were used to estimate pseudo label. According to the previous study in SRST model [13], the class averaging f-score was saturated at K = 2 because the case, which three or more target sounds happen at a time, is unusual in practical environments. With both test datasets, sound intervals that are overlaid with other target sound were counted and the results are summarized in TABLE III.…”
Section: F Exploration Of Maximum Number Of Concurrent Eventsmentioning
confidence: 99%
See 3 more Smart Citations