Incremental Learning Algorithm For Sound Event Detection

Koh, Eunjeong; Saki, Fatemeh; Guo, Yinyi; Hung, Cheng‐Yu; Visser, Erik

doi:10.1109/icme46284.2020.9102859

Cited by 7 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Continual learning (never-ending learning, incremental learning, lifelong learning) [20][21][22], in contrast, is an online learning strategy where an algorithm seeks to continuously adapt to a sequence of tasks and perform well on all tasks without forgetting. It has been proposed for sound classification [23] and sound event detection [24] to learn new sound events without forgetting the previously learned ones. However, continual learning approaches typically require retraining when introducing novel classes, complicated training procedure, or large amounts of labeled data of the novel classes, which are not ideal for practical application with resourceconstrained computing environments or audio domains.…”

Section: Introductionmentioning

confidence: 99%

Few-Shot Continual Learning for Audio Classification

Wang

Bryan

Cartwright

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Supervised learning for audio classification typically imposes a fixed class vocabulary, which can be limiting for real-world applications where the target class vocabulary is not known a priori or changes dynamically. In this work, we introduce a few-shot continual learning framework for audio classification, where we can continuously expand a trained base classifier to recognize novel classes based on only few labeled data at inference time. This enables fast and interactive model updates by end-users with minimal human effort. To do so, we leverage the dynamic few-shot learning technique and adapt it to a challenging multi-label audio classification scenario. We incorporate a recent state-of-the-art audio feature extraction model as a backbone and perform a comparative analysis of our approach on two popular audio datasets (ESC-50 and AudioSet). We conduct an in-depth evaluation to illustrate the complexities of the problem and show that, while there is still room for improvement, our method outperforms three baselines on novel class detection while maintaining its performance on base classes.

show abstract

Section: Introductionmentioning

confidence: 99%

Few-Shot Continual Learning for Audio Classification

Wang

Bryan

Cartwright

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Most of ICL works have focused on computer vision tasks such as image classification [ 4 ], semantic segmentation [ 5 ], image classification in a number of isolated tasks [ 6 ]. Only a few [ 7 , 8 ] have focused on the incremental learning of new acoustic events for detection of the events. However, incremental learning without forgetting may also be useful for various tasks such as speech recognition, voice detection, acoustic scene analysis (ASA), acoustic event recognition (AER), acoustic anomaly detection (AAD), acoustic novelty detection (AND).…”

Section: Introductionmentioning

confidence: 99%

“…A few recent works have focused on incremental learning using CNN for AER. In [ 7 , 8 ], the performances of incremental learning were evaluated using Mel-spectrograms from one-second audio files. However, the detection of novel classes in the solutions to incremental learning, has not been addressed in the ICL studies.…”

Section: Introductionmentioning

confidence: 99%

An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition

Bayram

İnce

2021

Sensors

View full text Add to dashboard Cite

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN–LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment.

show abstract