ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053023
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Branch Learning for Weakly-Labeled Sound Event Detection

Abstract: There are two sub-tasks implied in the weakly-supervised SED: audio tagging and event boundary detection. Current methods which combine multi-task learning with SED requires annotations both for these two sub-tasks. Since there are only annotations for audio tagging available in weaklysupervised SED, we design multiple branches with different learning purposes instead of pursuing multiple tasks. Similar to multiple tasks, multiple different learning purposes can also prevent the common feature which the multip… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…However, though different pooling strategies have been proposed, the correlation between the pooling methods and the classifier backends might be under investigation. Attention level pooling methods seem to be preferred for models without sequential model capabilities [6], [13] (e.g., CNN), whereas max and linear softmax pooling functions have seen success in CRNN frameworks [11], [5].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, though different pooling strategies have been proposed, the correlation between the pooling methods and the classifier backends might be under investigation. Attention level pooling methods seem to be preferred for models without sequential model capabilities [6], [13] (e.g., CNN), whereas max and linear softmax pooling functions have seen success in CRNN frameworks [11], [5].…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, it is indicated that feature-level aggregation methods (e.g., hidden layer) should be preferred over event-level methods (e.g., output layer). Recent work in [13] proposes multibranch learning, similar to multi-task learning, which utilizes a multitude of temporal pooling strategies in order to prevent overfitting towards a single method. Similar methods regarding disentanglement are spectral event-specific masking, as introduced in [14].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To liberate model from such trade-off, Lin et al [29] propose a teacher-student framework named Guided Learning where teacher model focuses on the audio tagging, while student model focuses on the event boundaries. In addition, Huang et al [30] regard weakly supervised SED as a kind of multi-task learning and propose a multi-branch learning method to ensure the encoder can capture more comprehensive feature fit for various subtasks. Synthetic data is another alternative to train SED models, which is generated by mixing up isolated events with a background noise randomly.…”
Section: B Sound Event Detection With Weakly-labeled and Synthetic Datamentioning
confidence: 99%
“…Thanks to the development of deep learning approaches, recent advances [4,5] have led to improved performance in SED task. Several standard convolutional neural network (CNN) blocks were stacked as the feature encoder to generate the high-level feature representations for the SED task [6,7]. Lu et al [8] proposed a multi-scale recurrent neural network (RNN) to capture the fine-grained and longterm dependencies of sound events.…”
Section: Introductionmentioning
confidence: 99%