2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) 2019
DOI: 10.1109/mmsp.2019.8901732
|View full text |Cite
|
Sign up to set email alerts
|

Multi-label Few-shot Learning for Sound Event Recognition

Abstract: Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 63 publications
0
14
0
Order By: Relevance
“…Few-Shot Acoustics Currently only a handful of studies exist that look at either few-shot audio classification or event detection. Of these, two are set in event detection [7,6] (classification of parts of an audio clip in time) with the other two focused on classification [14,15] (classification of an entire audio clip), the focus of this work. Comparing these works, we see a variety of approaches taken toward dataset processing, split formulation and reproducibility.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Few-Shot Acoustics Currently only a handful of studies exist that look at either few-shot audio classification or event detection. Of these, two are set in event detection [7,6] (classification of parts of an audio clip in time) with the other two focused on classification [14,15] (classification of an entire audio clip), the focus of this work. Comparing these works, we see a variety of approaches taken toward dataset processing, split formulation and reproducibility.…”
Section: Related Workmentioning
confidence: 99%
“…Acoustic classification and event detection have been well studied in conventional fully supervised machine learning [4,5], with many public datasets having a common evaluation protocol that is adhered to by the community, allowing for standardisation and fair comparison. This has however not extended to the few-shot equivalent, where the majority of the works that do exist make little attempt at preserving reproducibility, typically with respect to dataset management and lack of public source code [6,7]. This absence of standardisation poses significant issues when looking to compare novel and existing methods alike.…”
Section: Introductionmentioning
confidence: 99%
“…The vocabulary used so far comprises 395 classes, yet many of them have few data (few tens of clips). While they are not adequate for common machine learning standards (e.g., deep learning), they can be useful for other practices requiring less data (e.g., few shot learning [72]). Likewise, this information can provide insight as to the specific content of the dataset.…”
Section: H Post-processingmentioning
confidence: 99%
“…information extraction [211], matchine translation [100], charge prediction [178], sequence labeling [349] Audio&Speech audio/speech/sound classification [350], [351], [352], [353], [354], [355], text-to-speech [356], [357], [358], [359], acoustic/sound event detection [360], [361], [362], speech generation [350], [363], keyword/command recognition [364], keyword spotting [365], human-fall detection [366], speaker recognition [367],…”
Section: Applicationsmentioning
confidence: 99%