Interspeech 2013 2013
DOI: 10.21437/interspeech.2013-535
|View full text |Cite
|
Sign up to set email alerts
|

A blind segmentation approach to acoustic event detection based on i-vector

Abstract: We propose a new blind segmentation approach to acoustic event detection (AED) based on i-vectors. Conventional approaches to AED often required well-segmented data with non-overlapping boundaries for competing events. Inspired by block-based automatic image annotation in image retrieval tasks, we blindly segment audio streams into equal-length pieces, label the underlying observed acoustic events with multiple categories and with no event boundary information, extract i-vector for them, and perform classifica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…Traditional methods for AED apply techniques from ASR directly. For instance, Mel Frequency Cepstral Coefficients (MFCC) were modeled with Gaussian Mixture Models (GMM) or Support Vector Machines (SVM) [12,13,14,15]. Yet, applying standard ASR approaches leads to inferior performance due to differences between speech and non-speech signals.…”
Section: Introductionmentioning
confidence: 99%
“…Traditional methods for AED apply techniques from ASR directly. For instance, Mel Frequency Cepstral Coefficients (MFCC) were modeled with Gaussian Mixture Models (GMM) or Support Vector Machines (SVM) [12,13,14,15]. Yet, applying standard ASR approaches leads to inferior performance due to differences between speech and non-speech signals.…”
Section: Introductionmentioning
confidence: 99%
“…The following section describes the experiments conducted on the GISE-51 Mixtures dataset. Previously addressed using classical approaches such as those based on i-vectors [15], Melfrequency cepstrum coefficient (MFCC) features and Gaussian Mixture models [16], there has been a recent shift towards applying deep neural networks for sound event recognition, including CNNs [3,17,18,19,20], Recurrent Neural Networks (RNNs) [21,22], as well as CNN-RNN hybrid approaches [23]. Instead of proposing a custom architecture, we instead focus on benchmarking convolutional neural networks from the prominent ResNet [24], DenseNet [25] and EfficientNet [26] paradigms to serve as baselines for future research.…”
Section: Methodsmentioning
confidence: 99%
“…However, such methods do not capture the complexity of real-life audio recordings. For event detection, researchers have therefore modeled audio using the second-order statistical covariance matrix of the low-level MFCC features [19,10,7,23]. There are two ways to compute the second-order statistics.…”
Section: Related Workmentioning
confidence: 99%