Proceedings of the Detection and Classification of Acoustic Scenes And Events 2019 Workshop (DCASE2019) 2019
DOI: 10.33682/9kcj-bq06
|View full text |Cite
|
Sign up to set email alerts
|

HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods

Abstract: In this paper, we present a method called HODGEPODGE 1 for largescale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge Task 4: Sound event detection in domestic environments. To perform this task, we adopted the convolutional recurrent neural networks (CRNN) as our backbone network. In order to deal with a small amount of tagged data and a large amounts of unlabeled in-domain data, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…The goal of ensemble methods is to combine the predictions of several models to improve generalizability and robustness over any single model. Previously, ensemble of AEC models has been studied in in [14], [52], [53], [54], [15], [12], [4], but typically only one strategy is covered in each of these previous efforts. In this work, we use the simple voting algorithm, but compare The first strategy investigated is checkpoint averaging, whereby the output of checkpoint models at multiple epochs are averaged together.…”
Section: B Ensemblementioning
confidence: 99%
“…The goal of ensemble methods is to combine the predictions of several models to improve generalizability and robustness over any single model. Previously, ensemble of AEC models has been studied in in [14], [52], [53], [54], [15], [12], [4], but typically only one strategy is covered in each of these previous efforts. In this work, we use the simple voting algorithm, but compare The first strategy investigated is checkpoint averaging, whereby the output of checkpoint models at multiple epochs are averaged together.…”
Section: B Ensemblementioning
confidence: 99%
“…Audio-based Localization: Speaker diarization [30,31] involves localization of speaker boundaries and grouping segments that belong to the same speaker. The DCASE Challenge examines sound event detection in domestic environments as one of the challenge tasks [32,33,34,35,36,37]. In our action localization setting, note that audio modality is unrestricted, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…We adopt the ideas of a prior work [15] which applied interpolation consistency training (ICT) to weakly-labeled semisupervised SED. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of predictions at those points; that is,…”
Section: Interpolation Consistency Trainingmentioning
confidence: 99%
“…It is intuitive that learning from interpolation samples can help the model discriminate samples that are ambiguous between two classes. Implementation of [15] replaces all input samples with interpolation samples and calculates the same loss function as the baseline. However, we find that original input samples can stabilize the model performance during training.…”
Section: Interpolation Consistency Trainingmentioning
confidence: 99%