HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised
                        Learning Methods

Shi, Ziqiang; Liu, Liu; Lin, Huibin; Liu, Rujie; Shi, Anyan

doi:10.33682/9kcj-bq06

Cited by 14 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goal of ensemble methods is to combine the predictions of several models to improve generalizability and robustness over any single model. Previously, ensemble of AEC models has been studied in in [14], [52], [53], [54], [15], [12], [4], but typically only one strategy is covered in each of these previous efforts. In this work, we use the simple voting algorithm, but compare The first strategy investigated is checkpoint averaging, whereby the output of checkpoint models at multiple epochs are averaged together.…”

Section: B Ensemblementioning

confidence: 99%

PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation

Gong¹,

Chung²,

Glass³

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Audio event classification is an active research area and has a wide range of applications. Since the release of AudioSet, great progress has been made in advancing the classification accuracy, which mostly comes from the development of novel model architectures and attention modules. However, we find that appropriate training techniques are equally important for building audio event classification models with AudioSet, but have not received the attention they deserve. To fill the gap, in this work, we present PSLA, a collection of training techniques that can noticeably boost the model accuracy including ImageNet pretraining, balanced sampling, data augmentation, label enhancement, model aggregation and their design choices.By training an EfficientNet with these techniques, we obtain a model that achieves a new state-of-the-art mean average precision (mAP) of 0.474 on AudioSet, outperforming the previous best system of 0.439.

show abstract

Section: B Ensemblementioning

confidence: 99%

PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation

Gong¹,

Chung²,

Glass³

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Audio-based Localization: Speaker diarization [30,31] involves localization of speaker boundaries and grouping segments that belong to the same speaker. The DCASE Challenge examines sound event detection in domestic environments as one of the challenge tasks [32,33,34,35,36,37]. In our action localization setting, note that audio modality is unrestricted, i.e.…”

Section: Related Workmentioning

confidence: 99%

Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization

Bagchi¹,

Mahmood²,

Fernandes³

et al. 2021

Preprint

View full text Add to dashboard Cite

State of the art architectures for untrimmed video Temporal Action Localization (TAL) have only considered RGB and Flow modalities, leaving the information-rich audio modality totally unexploited. Audio fusion has been explored for the related but arguably easier problem of trimmed (clip-level) action recognition. However, TAL poses a unique set of challenges. In this paper, we propose simple but effective fusion-based approaches for TAL. To the best of our knowledge, our work is the first to jointly consider audio and video modalities for supervised TAL. We experimentally show that our schemes consistently improve performance for state of the art video-only TAL approaches. Specifically, they help achieve new state of the art performance on largescale benchmark datasets -

show abstract

“…We adopt the ideas of a prior work [15] which applied interpolation consistency training (ICT) to weakly-labeled semisupervised SED. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of predictions at those points; that is,…”

Section: Interpolation Consistency Trainingmentioning

confidence: 99%

“…It is intuitive that learning from interpolation samples can help the model discriminate samples that are ambiguous between two classes. Implementation of [15] replaces all input samples with interpolation samples and calculates the same loss function as the baseline. However, we find that original input samples can stabilize the model performance during training.…”

Section: Interpolation Consistency Trainingmentioning

confidence: 99%

Sound Event Detection by Consistency Training and Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

Koh

Chen

Liu

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Due to the high cost of large-scale strong labeling, sound event detection (SED) using only weakly-labeled and unlabeled data has drawn increasing attention in recent years. To exploit large amount of unlabeled in-domain data efficiently, we applied three semi-supervised learning strategies: interpolation consistency training (ICT), shift consistency training (SCT), and weakly pseudo-labeling. In addition, we propose FP-CRNN, a convolutional recurrent neural network (CRNN) which contains feature-pyramid (FP) components, to leverage temporal information by utilizing features at different scales. Experiments were conducted on DCASE 2020 task 4. In terms of event-based F-measure, these approaches outperform the official baseline system, at 34.8%, with the highest Fmeasure of 48.0% achieved by an FP-CRNN that was trained with the combination of all three strategies.

show abstract

HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods

Cited by 14 publications

References 11 publications

PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation

PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation

Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization

Sound Event Detection by Consistency Training and Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

Contact Info

Product

Resources

About