2022 International Joint Conference on Neural Networks (IJCNN) 2022
DOI: 10.1109/ijcnn55064.2022.9892893
|View full text |Cite
|
Sign up to set email alerts
|

Relation-guided acoustic scene classification aided with event embeddings

Abstract: In real life, acoustic scenes and audio events are naturally correlated. Humans instinctively rely on fine-grained audio events as well as the overall sound characteristics to distinguish diverse acoustic scenes. Yet, most previous approaches treat acoustic scene classification (ASC) and audio event classification (AEC) as two independent tasks. A few studies on scene and event joint classification either use synthetic audio datasets that hardly match the real world, or simply use the multi-task framework to p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(19 citation statements)
references
References 25 publications
0
19
0
Order By: Relevance
“…However, real-life acoustic scenes and audio events naturally have implicit relationships with each other, and these relationships between scenes and events are not fully explored and used in Framework2. To this end, we recently proposed a new Relation-Guided ASC (RGASC) model to further exploit and coordinate the scene-event relation for the mutual benefit of scene and event recognition [19].…”
Section: Collaborative Acoustic Scene and Event Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…However, real-life acoustic scenes and audio events naturally have implicit relationships with each other, and these relationships between scenes and events are not fully explored and used in Framework2. To this end, we recently proposed a new Relation-Guided ASC (RGASC) model to further exploit and coordinate the scene-event relation for the mutual benefit of scene and event recognition [19].…”
Section: Collaborative Acoustic Scene and Event Classificationmentioning
confidence: 99%
“…Inspired by the idea of RGASC [19], to jointly classify the auditory scene and label sound events, the collaborative scene-event classification (CSEC) framework is introduced. It uniquely extends current practice models by introducing a learnable coupling matrix between a scene classification branch that solely relies on basic acoustic features and an event identification branch that solely relies on acoustic features, to assist the acoustic scene classification.…”
Section: Collaborative Acoustic Scene and Event Classificationmentioning
confidence: 99%
“…The encoder layers are followed by a linear embedding layer with ReLU activation that maps the high-level representations of audio events to labels for classification. As the audio branch performs multilabel classification, binary cross-entropy (BCE) loss is used [9]. Denote the output of audio branch as ŷe ∈ R Ce , and the corresponding label as y e ∈ R Ce , the loss can be defined as:…”
Section: A the Audio Branchmentioning
confidence: 99%
“…Then, the event and object embeddings are concatenated together to form audio-visual semantic embeddings, and the fusion layer with ReLU activation maps the audio-visual embeddings into scene classes. As scene classification performs single-label multiclass classification, cross-entropy loss [9] is used between the output ŷs ∈ R Cs and the scene label y s ∈ R Cs ,…”
Section: Semantic-based Fusion (Sf)mentioning
confidence: 99%
See 1 more Smart Citation