2017 25th European Signal Processing Conference (EUSIPCO) 2017
DOI: 10.23919/eusipco.2017.8081712
|View full text |Cite
|
Sign up to set email alerts
|

A neural network approach for sound event detection in real life audio

Abstract: Abstract-This paper presents and compares two algorithms based on artificial neural networks (ANNs) for sound event detection in real life audio. Both systems have been developed and evaluated with the material provided for the third task of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. For the first algorithm, we make use of an ANN trained on different features extracted from the downmixed mono channel audio. Secondly, we analyse a binaural algorithm where the same fea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…In particular, the CapsNet obtains the lowest ER on the cross-validation performed on Development dataset when is fed by the binaural version of such features. On the two scenarios of the Evaluation dataset, a model based on CapsNet and binaural STFT obtains an averaged ER equal to 0.69, which is largely below both the challenge baseline [31] (-0.19) and the best score reported in literature [34] (-0.10). The comparative method based on CNNs seems not to fit at all when LogMels are used as input, while the performance is aligned with the challenge baseline based on GMM classifiers when the models are fed by monaural STFT.…”
Section: A Tut-sed 2016mentioning
confidence: 80%
See 1 more Smart Citation
“…In particular, the CapsNet obtains the lowest ER on the cross-validation performed on Development dataset when is fed by the binaural version of such features. On the two scenarios of the Evaluation dataset, a model based on CapsNet and binaural STFT obtains an averaged ER equal to 0.69, which is largely below both the challenge baseline [31] (-0.19) and the best score reported in literature [34] (-0.10). The comparative method based on CNNs seems not to fit at all when LogMels are used as input, while the performance is aligned with the challenge baseline based on GMM classifiers when the models are fed by monaural STFT.…”
Section: A Tut-sed 2016mentioning
confidence: 80%
“…The decision is based on likelihood ratio between the positive and negative models for each individual class, with a sliding window of one second. To the best of our knowledge, the most performing method for this dataset is an algorithm we proposed [34] in 2017, based on binaural MFCC features and a Multi-Layer Perceptron (MLP) neural network used as classifier. The detection task is performed by an adaptive energy Voice Activity Detector (VAD) which precedes the MLP and determines the starting and ending point of an event-active audio sequence.…”
Section: Comparative Algorithmsmentioning
confidence: 99%
“…All the extracted features from the audio files are binaural. This is because, as suggested by [9,16], binaural features usually outperform monophonic spectral features for SED tasks. We configure the sample rate of STFT at 16kHz, and normalize the signals between -1 and 1 in input audios .…”
Section: Multi-type-multi-scale Tfr Extractionmentioning
confidence: 90%
“…To verify the effectiveness of our method, we select the following methods for comparisons (denoted with their main models): 1) RNN [19]: Best solution in the DCASE 2016 Challenge, by using mel engery features; 2) GNN [18]: Ranked the 2nd place in the DCASE 2016 Challenge, which uses mel energy; 3) MLP [16]: Outperforms [19] by using binaural log mel as features and Multi-Layer Perceptrons as classifiers; and 4) CapsNet [14]: State-of-the-art solution on the TUT-SED 2016 Dataset, which obtains the best results using binaural spectrograms. we conduct an ablation study by breaking down the results into different features and discuss the key observations on the TUT-SED 2016 Evaluation set.…”
Section: Comparative Methodsmentioning
confidence: 99%
See 1 more Smart Citation