ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683627
|View full text |Cite
|
Sign up to set email alerts
|

Sound Event Detection with Sequentially Labelled Data Based on Connectionist Temporal Classification and Unsupervised Clustering

Abstract: Sound event detection (SED) methods typically rely on either strongly labelled data or weakly labelled data. As an alternative, sequentially labelled data (SLD) was proposed. In SLD, the events and the order of events in audio clips are known, without knowing the occurrence time of events. This paper proposes a connectionist temporal classification (CTC) based SED system that uses SLD instead of strongly labelled data, with a novel unsupervised clustering stage. Experiments on 41 classes of sound events show t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 16 publications
0
22
0
Order By: Relevance
“…The features used in this work are log mel-band energies which were previously adopted in [5,17,20], whereas dilated CRNN is proposed to be used as the classifier. CRNNs will be described in next subsections.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The features used in this work are log mel-band energies which were previously adopted in [5,17,20], whereas dilated CRNN is proposed to be used as the classifier. CRNNs will be described in next subsections.…”
Section: Methodsmentioning
confidence: 99%
“…The RNN has been combined with convolutional layers, resulting in a CRNN which achieved This work was supported by the national natural science foundation of China (61771200, 6191101570, 6191101514, and 6191101306) and the project of international science and technology collaboration of Guangdong province (2019A050509001). state-of-the-art results in detecting sound events [5,6,17].…”
Section: Introductionmentioning
confidence: 99%
“…In order to tackle the labeling efforts needed for strong labeling, Hou et al [25] and Wang and Metze [3] proposed the use of sequential labels. The idea of sequential labeling is that rather than annotating the start and end of an event, sequential labeling labels the sequence of event occurrence.…”
Section: Related Workmentioning
confidence: 99%
“…Convolutional layers with gated linear units (GLUs) [15] are applied to learn local shift-invariant patterns from the acoustic feature. By using GLUs, the model can learn to attend to target events and ignore unrelated sounds [16]. The pooling operation is implemented by convolution with strides (1, 2), which means the stride of the convolution along the time is 1 to preserve the time resolution of the input.…”
Section: The Audio Branchmentioning
confidence: 99%
“…For evaluation metrics, event-based precision (P ), recall (R ), F-score and Error rate (ER ) [21] are used. Compared with segment-based metrics used in previous studies [22,16,2], event-based metrics are more rigorous and accurate to measure the location of events. Higher P, R, F and lower ER indicate a better performance.…”
Section: Dataset Baseline and Experiments Setupmentioning
confidence: 99%