Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1197
|View full text |Cite
|
Sign up to set email alerts
|

Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging

Abstract: Multiple instance learning (MIL) has recently been used for weakly labelled audio tagging, where the spectrogram of an audio signal is divided into segments to form instances in a bag, and then the low-dimensional features of these segments are pooled for tagging. The choice of a pooling scheme is the key to exploiting the weakly labelled data. However, the traditional pooling schemes are usually fixed and unable to distinguish the contributions, making it difficult to adapt to the characteristics of the sound… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 19 publications
0
13
0
Order By: Relevance
“…Regarding embedding pooling methods, even though singlehead attention has not been shown to be better than prediction pooling [20,21], there has been a growing number of recent studies using multi-head attention on audio classification tasks [26,30,17,27]. Multiple attention heads hold the potential of learning to attend to different patterns, however, they introduce additional parameters, as well as the need to aggregate their outputs.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Regarding embedding pooling methods, even though singlehead attention has not been shown to be better than prediction pooling [20,21], there has been a growing number of recent studies using multi-head attention on audio classification tasks [26,30,17,27]. Multiple attention heads hold the potential of learning to attend to different patterns, however, they introduce additional parameters, as well as the need to aggregate their outputs.…”
Section: Related Workmentioning
confidence: 99%
“…Alternatively, the authors of [27] propose to use a different, fixed temperature parameter per head to encourage them to attend to different event durations, and then concatenate the pooled embeddings. Finally, the authors of [17] also use concatenation of the weighted averages, as well as of the weighted standard deviation of the embeddings, as proposed in [31], before applying a gating mechanism on this aggregated vector.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations